Issues with Job running on multiple nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Issues with Job running on multiple nodes

Post by sshettar »

Hi All,

There is this job which is curently running on 1 node , i made few changes to this job and ran the job on 1 node itself and its working fine , now i wanted to check the performance running on 4 nodes .
the job ran successfully but when i checked the dataset ( which is what i load at the end of my job) , i see that the data is flowing only through 1 node and the rest 3 nodes have 0 records through them .

ANy light on why this could be happening?

Any help is highly appraciated.


Thanks
Shalini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What kind of partitioning are you doing in the job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Post by sshettar »

I have set the partitioning to Auto itself
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, how about some details of your job design? Anything running in Sequential rather than Parallel mode?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Is the source a sequential file or very small volume ?
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Post by sshettar »

The Job is doing several joins. ( all the joins are necessary) and the data is sorted within the database stages itself using the order by clause. but the partitioing i have left it to auto since there were several joins and i thought specifying the partitioing and sorting would probably degrae the performance. hence used auto instead. finally a dataset gets loaded.

Join Stage1---5 DBStages---output1
output1 joined with Dbsatge ---output2
....
...
...

similarly after 6 such joins the data is passed through the transformer and then loads the dataset.

Hope i am fairly clear in explaining with my design

Thanks in advance
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

If your volume is small, only on node will be used. Increasing nodes doesn't necessarily mean better throughput. It may take longer for the job to setup with 4 nodes. Add both the startup and CPU time to determine which is the optimal setting. I have found that one node is more efficient most of the time.
Jim Stewart
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

How many records are being processed through this job (including on each side of each join)?

Add environment variable $APT_DUMP_SCORE with value True and rerun the job, then copy and paste the output from the job log here please.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Post Reply