Best way to call a Shell Script

DSFreddie · Post by **DSFreddie** » Mon Oct 22, 2018 6:49 pm

Hi All,

I know there are multiple ways to call a Shell Script in Datastage such as,

1. Execute command activity
2. Before/After job Subroutine
3. Transformer stage ? etc...

I am trying to figure out the best way in terms of overall performance. Can anyone shed some light on what is the ideal way to execute a shell script in datastage ? ( our platform is GRID enabled)

Thanks
Freddie

chulett · Post by **chulett** » Mon Oct 22, 2018 8:41 pm

I'm not sure there's a "best" way, overall performance-wise or not. Basically it's just shelling out to the O/S and running it, which is what they all would do. The basic difference is the where of it and what the script needs to accomplish, other than succeed or fail. #1 and #2 are single use, while #3 would be for every record through the job. #1 could give you conditional control of downstream decisions / processes while #2 would just run before or after job and maybe make it go boom. I think #3 would generally be a mistake but simply saying "call a shell script" is a very wide-open topic. As I'm sure you knew. Care to narrow it down a bit?

Not anything I've worked with but I assume GRID would only affect the answer for #3.

FranklinE · Post by **FranklinE** » Tue Oct 23, 2018 6:43 am

As Craig mentions, it depends on some specifics, but I use a specific standard for the first decision about "where": what level of error handling does it require?

The Exec stage is best for this. It handles return codes, it provides explicit tracing of parameters, and (most important) writes useful entries to the job log... well, not all log entries are equal, but at the job sequence level I am also afforded explicit error handling with checkpoints.

That last is critical here. If I can fix an underlying problem and rerun without further intervention, that is my first choice. The key point is that when a Px job abends, its parameters' values are embedded. Restarting at the Px job means that a badly valued parameter can't be fixed. I have to reset the job and the sequence to correct a bad parameter value. If I used a script to provide that value, and the abort can be handled at the script, I avoid extra work on the entire job.

Example:

Code: Select all

UserVariables ==> Execute Script ==> Activity Stage Px

Checkpoints are active. A terminate stage is a second link for the script and Px job. Triggers are set to examine the critical values.

So, for a hypothetical situation, a script retrieves the name of a file using Unix ls command. The Px job uses the file name. The Exec stage triggers cause abort if the result of the ls command is no file found. If you don't abort at the Exec stage, you can't restart at the Px job after "fixing" the file situation, because the name of the fixed file is different from the name of the file not found.

chulett · Post by **chulett** » Tue Oct 23, 2018 10:23 am

That "Exec" stage is the Execute Command stage, yes? Just wanted to make sure it wasn't that odd variant that is only available on a Windows installation that I don't quite remember the name of... ah, the Command stage. Never mind.

FranklinE · Post by **FranklinE** » Tue Oct 23, 2018 10:27 am

I avoid using the correct name "Execute Command" because it sounds like I'm ordering someone's demise.

But yes, that's the one, officer. I saw it do the deed.

ray.wurlod · Post by **ray.wurlod** » Thu Oct 25, 2018 3:23 am

It probably doesn't matter all that much. In each case the executing process has to fork a child process to execute the shell to execute the shell script. Most of the "performance" (whatever that means) impact is in the creation and management of the child process, which is the same for all three approaches.