Start up time/performance tuning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Parallel jobs are designed to maximize throughput. Because of the relative complexity of process interaction in PX jobs, the startup time needed in order to get everything sychronized in a "ready" state is going to take quite some time; the more nodes you declare in your APT_CONFIG file and the less combination you allow inside your job will both affect startup times.

Once a PX job is ready to go it will usually run at very high rates of throughput, usually an order of magnitude or even more than a similar Server type job.

The decision is up to the developer when it comes to job design - any job with very few data records is going to run faster when designed as a Server job than when written as a PX job. The difficulty lies in estimating what size of job or amount of data is going to be dividing line in deciding what type of a job to use.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

Forgive me for asking this very simple question:

Why do you want to convert your server jobs to parallel jobs when you want to process a few records?
As Ray has already pointed and you've noticed, Parallel jobs take longer than server jobs to start. So if you know that a job is going to process very few records, server jobs perform better than their parallel counterparts.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

Post by wesd »

Or, going forward, does it really matter if it takes a couple seconds extra to process 10 records? I'd stick to parallel jobs going forward unless there is a compelling reason not to given that is the direction IBM is heading with the product (legacy support for Server jobs but no new development).
Wes Dumey
Senior Consultant
Data Warehouse Projects
suresh_dsx
Participant
Posts: 160
Joined: Tue May 02, 2006 7:49 am

Post by suresh_dsx »

Hi all,
Thanks for reply.
I am unable to see the Andrew message because I am not a premium member.
My project is divided into several levels (1, 2, 3, 4, and 5)
Levels 1-4: loaded high volume of data.
As I understand, Using parallel jobs, loading huge amount of data, the performance is good (Server vs. parallel)
Level 5 alone less amount of data. The sequencer trigger based on ID (Instance Load). Out of 10 ID's one ID having huge amount of data and remaining IDs having less amount of data.
We implemented the code in parallel Level 1-4; my client is satisfied because showed the variation of performance between server/parallel.
Now the Level 5 are facing the problem on performance because of fewer amounts of data.
Now I want to know how to speed up the start up time in parallel job.
I tried running sequencer with different configuration file. The run time is lesser then previous (5 Sec less but still more time compared with the server job),

Thanks for your help in advance.

Thanks and Regards,
Suri
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Simplest answer? If you are unable or unwilling to keep certain jobs as Server jobs then change the expectations there. Stop worrying about this 'startup / performance' non-issue and accept it as part of the migration to the Wonderful World of PX.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply