How to submit a shell script on a grid?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

How to submit a shell script on a grid?

Post by bobyon »

When running on a grid, what is the best way to execute a shell script to ensure that it will run on an available compute node rather than the head/conductor node?

I tried using the exec_command activity in a sequencer, but that executes the script on the head node and never even calls the load levelor.

I have also use the external source stage in a Px job to submit the script, and that works. I'm just not sure this is the "right" way.

Is there a better way?
Bob
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

To create a dynamic configuration file for a job sequence, you must invoke the sequencer.sh script. In your downstream job activity stages, did you use the expression: Field(<stagename>.$CommandOutput," ",<fieldnum>) to accept the values passed from the sequencer.sh script? For each job in the sequence, did you set APT_GRID_ENABLE parameter to NO?
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

thanks for the reply but it really does not address my question at all.

I'm not attempting to run sequencer.sh but rather an in house developed shell script. The priimary goal is to get the script to run on a compute node rather than the head node.
Bob
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is there a particular reason/need to run the script on a compute node rather than the head node?

Sequence jobs, because they are at heart server jobs, can run only on the head node (the Engine server or tier). The ExecCommand activity, as well as BeforeJob and AfterJob ExecSH functions (I believe so, at least), also execute their targets on the head node (as you have already seen). If you need to run the script on a compute node, you can use one of the External stages (External Source/Target/Filter).

Another option would be to submit the script directly to Load Leveler using it's standard commands to do so, which you may even be able to do using ExecCommand in a Sequence Job. Look through the Load Leveler documentation (it is available online) to figure out how to do that.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Bob,
That's exactly how to (using compute node info passed from the sequencer.sh script) get your ExecCommand activity job processed on a compute node.

Please post your job sequence design, then it will be much easier for me to show you how to do it.
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Is the platform/has LSF?
Thanks,
Prasanna
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

jwiles wrote:Is there a particular reason/need to run the script on a compute node rather than the head node?
Yes, the processing that is being done in the shell script is causing high wait CPU % on the head node and running it on a compute node doesn't have that problem. We need to reserve the head node for sequence/conductor type processes to keep things running smoothly.
Bob
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

lstsaur wrote:That's exactly how to (using compute node info passed from the sequencer.sh script) get your ExecCommand activity job processed on a compute node.
I guess there is a piece of this that I am still not understanding. Are you saying that if I execute sequncer.sh then I can somehow coax a subsequent exec command stage to run the application script on a compute node?

currently my sequence job is merely the exec command stage to execute the application shell script.

Or (and this is the only thing I have found so far that will get the application script to run on a compute node) the Sequencer contains one job activity that calls a parallel job that contains nothing but an external source stage, that executes the application script, and a peek stage.

Thanks,
Bob
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You may be able to get it to work by using the ssh command in the ExecCommand activity:

ssh hostname scriptname

Pull the hostname from the values returned by sequencer.sh. If you're feeling adventurous, pull a hostname from one of the node entries in the config file returned by sequencer.sh :)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Bobyon, There's two ways I see you doing your task.

External source stage and limit the execution of the stage to Node1.

or

Exec Sequencer and have your script call the proper grid dispatching command that you would normaly be doing via DynamicGrid.sh.

Not sure which flavor of Grid you have, but the easy path is external source stage.

Talk to the datastage admins within your environment. They most likely can direct you to your answer. If you want your grid resource manager to track work activity then go that route. You will have to redirect stdout and stderr to a log file you can use later. External source stage might kick that into your job log for you.


slapping shell scripts onto your work horses is EXACTLY THE RIGHT THING TO DO... if the overhead of dispatching the work is less than the work itself.

I HATE using the Head Node (Conductor) as a number cruncher or data mover. Hate.

Remember that if you chose to have your script do the grid dispatching call, use the same resource requirements that your datastage jobs are being submitted with. You still want to hit the same pool of servers that are dedicated to your DataStage setup. (mainly because your mounts are all present there)
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

PaulVL wrote:Bobyon, There's two ways I see you doing your task.

External source stage and limit the execution of the stage to Node1.
This looks like the approach I am going to take. I'll let the load leveler determine which node to run it on but will set the grid parms to a 1x1 config.
Talk to the datastage admins within your environment.
The mirror has not been much help on this one :wink:

Thanks to all contributors for your help and advice.
Bob
Post Reply