Job processing speed Vs available nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Job processing speed Vs available nodes

Post by zulfi123786 »

Hi,

There seems to be a situation which I cant fully comprehend.

A job is reading data from sequential file feeding data to Basic Transformer and writing to sequential file.

The basis transformer calls server routine which in turn calls Change() over 200 times, so is CPU intensive.

The job when run on 2 nodes has taken 2 hours to complete and when run on 4 nodes has also taken 2 hours to complete.

I have used HP Performance manager to analyse system load and it turns out that CPU consumption was at 20% (total 24 processors on AIX) and this job added 20% more on 2 nodes and 30% more on 4 nodes so there was still lot of CPU room left. Memory utilization was in both cases 85% while the job was running. Peak disk utilization was also very low, so was CPU run queue.

When there was more CPU available, why did the jobs processing not improve with 2 additional nodes when run on 4 node config file.

Thanks in advance
- Zulfi
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Hi Zulfi,
Last time I used a Basic transformer it was not capable to execute in parallel as the rest of the stages. That's still might be the case... However nothing prevent you to split the data flow upstream base on a field value and send it to multiple Basic Transformer... In other words, implement your own paralellism and gets the improvement that you are looking for

Regards
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In version 11, Change() function is available in the parallel Transformer stage. So is Ereplace() function.

So you may be able to replace your BASIC Transformer stage with a parallel Transformer stage. I guess it depends on the complexity of the server routine; but I would further guess that you can probably use stage variables and looping to achieve most things.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Hi Ray,

These jobs were born in the earlier versions so the Server Routine. The routine has nothing but cascaded calls to Change() to replace a list of 200 substrings. We can break it down into stage variables in parallel transformer but this needs to replicated in all places the server routine is called.

I would have had it in Parallel routine but these have a memory leak issue where memory of the pointer that is being returned by the function is not being released post the row is processed and the job aborts when data has millions of records.

Would appreciate if you have anything relevant to share from your vast experience as to why the job would not use available CPU to improve performance when run on more nodes. Would you think this is restricted at any level ?

There are no limits on userid though.

Thanks
- Zulfi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor? - don't recall which but I'm sure someone knows for certain.

I imagine the score if you dumped it would confirm this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

you can make routines in other languages that can use multi-threading if you really need the performance, or that can be called from a parallel transformer, if splitting multiple copies of the basic transformer out to manually make it parallel is not sufficient.

Comes down to "what do you need done" and "how fast do you need it, really".

If you can't do what you need to do in datastage built in tools, and basic isnt fast enough... this should be a pretty rare issue, but when it hits, datastage gave hooks to solve the problem.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

chulett wrote:As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor?
Its an SMP server with just one physical node so all logical nodes are running on same box. The basic transformer is running in parallel as the job monitor shows multiple instances for this stage.
- Zulfi
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

UCDI wrote:you can make routines in other languages that can use multi-threading if you really need the performance
The question here is not about improving performance, instead its on why the jobs is not utilizing additional free resources when more nodes are made available to it .
- Zulfi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... the SMP notes would have been good to know up top. As to your last question, isn't that all up to the operating system, not DataStage? And aren't you worrying about the "additional free resources" so that performance improves? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply