Job processing speed Vs available nodes
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Job processing speed Vs available nodes
Hi,
There seems to be a situation which I cant fully comprehend.
A job is reading data from sequential file feeding data to Basic Transformer and writing to sequential file.
The basis transformer calls server routine which in turn calls Change() over 200 times, so is CPU intensive.
The job when run on 2 nodes has taken 2 hours to complete and when run on 4 nodes has also taken 2 hours to complete.
I have used HP Performance manager to analyse system load and it turns out that CPU consumption was at 20% (total 24 processors on AIX) and this job added 20% more on 2 nodes and 30% more on 4 nodes so there was still lot of CPU room left. Memory utilization was in both cases 85% while the job was running. Peak disk utilization was also very low, so was CPU run queue.
When there was more CPU available, why did the jobs processing not improve with 2 additional nodes when run on 4 node config file.
Thanks in advance
There seems to be a situation which I cant fully comprehend.
A job is reading data from sequential file feeding data to Basic Transformer and writing to sequential file.
The basis transformer calls server routine which in turn calls Change() over 200 times, so is CPU intensive.
The job when run on 2 nodes has taken 2 hours to complete and when run on 4 nodes has also taken 2 hours to complete.
I have used HP Performance manager to analyse system load and it turns out that CPU consumption was at 20% (total 24 processors on AIX) and this job added 20% more on 2 nodes and 30% more on 4 nodes so there was still lot of CPU room left. Memory utilization was in both cases 85% while the job was running. Peak disk utilization was also very low, so was CPU run queue.
When there was more CPU available, why did the jobs processing not improve with 2 additional nodes when run on 4 node config file.
Thanks in advance
- Zulfi
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Hi Zulfi,
Last time I used a Basic transformer it was not capable to execute in parallel as the rest of the stages. That's still might be the case... However nothing prevent you to split the data flow upstream base on a field value and send it to multiple Basic Transformer... In other words, implement your own paralellism and gets the improvement that you are looking for
Regards
Last time I used a Basic transformer it was not capable to execute in parallel as the rest of the stages. That's still might be the case... However nothing prevent you to split the data flow upstream base on a field value and send it to multiple Basic Transformer... In other words, implement your own paralellism and gets the improvement that you are looking for
Regards
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In version 11, Change() function is available in the parallel Transformer stage. So is Ereplace() function.
So you may be able to replace your BASIC Transformer stage with a parallel Transformer stage. I guess it depends on the complexity of the server routine; but I would further guess that you can probably use stage variables and looping to achieve most things.
So you may be able to replace your BASIC Transformer stage with a parallel Transformer stage. I guess it depends on the complexity of the server routine; but I would further guess that you can probably use stage variables and looping to achieve most things.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Hi Ray,
These jobs were born in the earlier versions so the Server Routine. The routine has nothing but cascaded calls to Change() to replace a list of 200 substrings. We can break it down into stage variables in parallel transformer but this needs to replicated in all places the server routine is called.
I would have had it in Parallel routine but these have a memory leak issue where memory of the pointer that is being returned by the function is not being released post the row is processed and the job aborts when data has millions of records.
Would appreciate if you have anything relevant to share from your vast experience as to why the job would not use available CPU to improve performance when run on more nodes. Would you think this is restricted at any level ?
There are no limits on userid though.
Thanks
These jobs were born in the earlier versions so the Server Routine. The routine has nothing but cascaded calls to Change() to replace a list of 200 substrings. We can break it down into stage variables in parallel transformer but this needs to replicated in all places the server routine is called.
I would have had it in Parallel routine but these have a memory leak issue where memory of the pointer that is being returned by the function is not being released post the row is processed and the job aborts when data has millions of records.
Would appreciate if you have anything relevant to share from your vast experience as to why the job would not use available CPU to improve performance when run on more nodes. Would you think this is restricted at any level ?
There are no limits on userid though.
Thanks
- Zulfi
As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor? - don't recall which but I'm sure someone knows for certain.
I imagine the score if you dumped it would confirm this.
I imagine the score if you dumped it would confirm this.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
you can make routines in other languages that can use multi-threading if you really need the performance, or that can be called from a parallel transformer, if splitting multiple copies of the basic transformer out to manually make it parallel is not sufficient.
Comes down to "what do you need done" and "how fast do you need it, really".
If you can't do what you need to do in datastage built in tools, and basic isnt fast enough... this should be a pretty rare issue, but when it hits, datastage gave hooks to solve the problem.
Comes down to "what do you need done" and "how fast do you need it, really".
If you can't do what you need to do in datastage built in tools, and basic isnt fast enough... this should be a pretty rare issue, but when it hits, datastage gave hooks to solve the problem.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Its an SMP server with just one physical node so all logical nodes are running on same box. The basic transformer is running in parallel as the job monitor shows multiple instances for this stage.chulett wrote:As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor?
- Zulfi
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore