Hi All,
I know there are multiple ways to call a Shell Script in Datastage such as,
1. Execute command activity
2. Before/After job Subroutine
3. Transformer stage ? etc...
I am trying to figure out the best way in terms of overall performance. Can anyone shed some light on what is the ideal way to execute a shell script in datastage ? ( our platform is GRID enabled)
Thanks
Freddie
Best way to call a Shell Script
Moderators: chulett, rschirm, roy
I'm not sure there's a "best" way, overall performance-wise or not. Basically it's just shelling out to the O/S and running it, which is what they all would do. The basic difference is the where of it and what the script needs to accomplish, other than succeed or fail. #1 and #2 are single use, while #3 would be for every record through the job. #1 could give you conditional control of downstream decisions / processes while #2 would just run before or after job and maybe make it go boom. I think #3 would generally be a mistake but simply saying "call a shell script" is a very wide-open topic. As I'm sure you knew. Care to narrow it down a bit?
Not anything I've worked with but I assume GRID would only affect the answer for #3.
Not anything I've worked with but I assume GRID would only affect the answer for #3.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
As Craig mentions, it depends on some specifics, but I use a specific standard for the first decision about "where": what level of error handling does it require?
The Exec stage is best for this. It handles return codes, it provides explicit tracing of parameters, and (most important) writes useful entries to the job log... well, not all log entries are equal, but at the job sequence level I am also afforded explicit error handling with checkpoints.
That last is critical here. If I can fix an underlying problem and rerun without further intervention, that is my first choice. The key point is that when a Px job abends, its parameters' values are embedded. Restarting at the Px job means that a badly valued parameter can't be fixed. I have to reset the job and the sequence to correct a bad parameter value. If I used a script to provide that value, and the abort can be handled at the script, I avoid extra work on the entire job.
Example:
Checkpoints are active. A terminate stage is a second link for the script and Px job. Triggers are set to examine the critical values.
So, for a hypothetical situation, a script retrieves the name of a file using Unix ls command. The Px job uses the file name. The Exec stage triggers cause abort if the result of the ls command is no file found. If you don't abort at the Exec stage, you can't restart at the Px job after "fixing" the file situation, because the name of the fixed file is different from the name of the file not found.
The Exec stage is best for this. It handles return codes, it provides explicit tracing of parameters, and (most important) writes useful entries to the job log... well, not all log entries are equal, but at the job sequence level I am also afforded explicit error handling with checkpoints.
That last is critical here. If I can fix an underlying problem and rerun without further intervention, that is my first choice. The key point is that when a Px job abends, its parameters' values are embedded. Restarting at the Px job means that a badly valued parameter can't be fixed. I have to reset the job and the sequence to correct a bad parameter value. If I used a script to provide that value, and the abort can be handled at the script, I avoid extra work on the entire job.
Example:
Code: Select all
UserVariables ==> Execute Script ==> Activity Stage Px
So, for a hypothetical situation, a script retrieves the name of a file using Unix ls command. The Px job uses the file name. The Exec stage triggers cause abort if the result of the ls command is no file found. If you don't abort at the Exec stage, you can't restart at the Px job after "fixing" the file situation, because the name of the fixed file is different from the name of the file not found.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
I avoid using the correct name "Execute Command" because it sounds like I'm ordering someone's demise.
But yes, that's the one, officer. I saw it do the deed.
But yes, that's the one, officer. I saw it do the deed.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It probably doesn't matter all that much. In each case the executing process has to fork a child process to execute the shell to execute the shell script. Most of the "performance" (whatever that means) impact is in the creation and management of the child process, which is the same for all three approaches.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.