Long startup time

abhik05 · Post by **abhik05** » Mon Aug 29, 2016 9:37 am

Hi
Job is taking long startup time in PRODUCTION.

The job is running with 1 processing node.
Design:
The job has one oracle connector stage as source.
With 2 other connector stages as used for lookups with one target connector stage.
Also there are one sort stage and transformer stages has been used in the job.

Job Log:
As per the log, it is showing that the SOURCE connector stage has connected to given database and then its running with 1 processing node.
Next statement in the log is the source SQL statement.
Now the difference of time given between the first 2 statements and SQL statement is almost equal to the job startup time given at below log.

The given SOURCE sql is an user-defined sql with parameters and not a datastage generated one.

I wanted to understand will the Source sql statement generation time is included in Job startup time?
What are the other factors which could possibly impact the startup time?

Please help.

PaulVL · Post by **PaulVL** » Mon Aug 29, 2016 2:47 pm

Please remember that the timestamp in the log is the timestamp of the message being saved to the log, not the time the action happened.

Building the result set on your source and extracting row #1 is part of job startup cost I think. At least that has been my experience eyeballing the logs thus far.

I believe all of the connections are established for all connectors, then the sql is sent and result set starts being created.

Without seeing the job design and actual log, it's hard for us to say.

abhik05 · Post by **abhik05** » Mon Aug 29, 2016 11:33 pm

Thanks PaulVL for replying.

What I have observed from the logs and designs of few jobs taking long startup time is like:
Jobs which are having database connector stages at source or any other places (for lookups,funnel) with large USER-DEFINED complex sqls (large output schema with joins with other tables using database hints ) in them, are taking much startup time compared to the jobs having less complex SQLs in their database connector stages.
Even running those jobs in peak time when most jobs are scheduled to run, might add up to extra build up time for those SQLs from database side.
So as you have mentioned the start time cost might include "Building the result set on your source and extracting row #1 is part of job startup cost" ,could be one issue.

I guess if they are issues with following measures then they should not be counted in startup time.
1. &PH& directory cleanup
2. /tmp directory cleanup
Please correct me if I am wrong.

Regards,

PaulVL · Post by **PaulVL** » Tue Aug 30, 2016 1:29 pm

If you have thousands of files in &PH& it might affect startup time... (doubtful you are running into that).

If you want to see if your SQL result set building is a factor, add this to your SQL statement as a test: WHERE 1=2

That will invalidate your SQL but will still connect to the database.

FranklinE · Post by **FranklinE** » Tue Aug 30, 2016 2:37 pm

Something to consider for your design:

We have many calls to our DB2 database, sometimes resulting in 1 million or more rows on output. In every case, we do the "heavy lifting" with local (z/OS) jobs and use stored procedures wherever possible.

This eliminates the data channel, which has a restricted bandwidth compared to the more open channels for FTP. It adds a landed file on the host and an FTP step to your job flow (and shifts development effort to the database itself), but can also show a significant performance improvement.