Long startup time

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhik05
Participant
Posts: 28
Joined: Thu Mar 08, 2012 8:31 am

Long startup time

Post by abhik05 »

Hi
Job is taking long startup time in PRODUCTION.

The job is running with 1 processing node.
Design:
The job has one oracle connector stage as source.
With 2 other connector stages as used for lookups with one target connector stage.
Also there are one sort stage and transformer stages has been used in the job.

Job Log:
As per the log, it is showing that the SOURCE connector stage has connected to given database and then its running with 1 processing node.
Next statement in the log is the source SQL statement.
Now the difference of time given between the first 2 statements and SQL statement is almost equal to the job startup time given at below log.

The given SOURCE sql is an user-defined sql with parameters and not a datastage generated one.

I wanted to understand will the Source sql statement generation time is included in Job startup time?
What are the other factors which could possibly impact the startup time?

Please help.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Please remember that the timestamp in the log is the timestamp of the message being saved to the log, not the time the action happened.


Building the result set on your source and extracting row #1 is part of job startup cost I think. At least that has been my experience eyeballing the logs thus far.

I believe all of the connections are established for all connectors, then the sql is sent and result set starts being created.


Without seeing the job design and actual log, it's hard for us to say.
abhik05
Participant
Posts: 28
Joined: Thu Mar 08, 2012 8:31 am

Post by abhik05 »

Thanks PaulVL for replying.

What I have observed from the logs and designs of few jobs taking long startup time is like:
Jobs which are having database connector stages at source or any other places (for lookups,funnel) with large USER-DEFINED complex sqls (large output schema with joins with other tables using database hints ) in them, are taking much startup time compared to the jobs having less complex SQLs in their database connector stages.
Even running those jobs in peak time when most jobs are scheduled to run, might add up to extra build up time for those SQLs from database side.
So as you have mentioned the start time cost might include "Building the result set on your source and extracting row #1 is part of job startup cost" ,could be one issue.

I guess if they are issues with following measures then they should not be counted in startup time.
1. &PH& directory cleanup
2. /tmp directory cleanup
Please correct me if I am wrong.

Regards,
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

If you have thousands of files in &PH& it might affect startup time... (doubtful you are running into that).

If you want to see if your SQL result set building is a factor, add this to your SQL statement as a test: WHERE 1=2

That will invalidate your SQL but will still connect to the database.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Something to consider for your design:

We have many calls to our DB2 database, sometimes resulting in 1 million or more rows on output. In every case, we do the "heavy lifting" with local (z/OS) jobs and use stored procedures wherever possible.

This eliminates the data channel, which has a restricted bandwidth compared to the more open channels for FTP. It adds a landed file on the host and an FTP step to your job flow (and shifts development effort to the database itself), but can also show a significant performance improvement.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply