Number of transformer stages affects performance ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Number of transformer stages affects performance ?

Post by kaps »

First, I read somewhere that if we you use too many transformer stages in Parallel jobs it affects the performance because it needs to get compiled in c++ but later remember reading that it does not affect performance in newer versions. Can someone tell me if that's true ? Why was it not a problem now ? Basically, we can use filter stage or transform stage to filter rows but does the tranform stage affects the performance in this case ?

Thanks
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Well, I would think that what you are doing within the transformers is really going to drive the answer to your question.


More stuff in a job will always lend itself to affecting performance vs less stuff in a job.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

However, as a general statement while transformers were a 'performance issue' in the beginning, that hasn't been true for quite some time. So simply using transformers in a job should no longer be a concern but best to use them when actually needed - if the work can be done by something native to the framework like a Filter or Modify stage, use those instead.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Two issues are being confused here. One is the performance of ONE Transformer stage, which was a problem in earlier versions but isn't any more. The other is the performance of TOO MANY stages (of whatever type). Every stage (ignoring operator combination) will generate an additional process; too many processes will overload your server. How many is too many depends on what they are doing; you can monitor their resource consumption with various tools, including Monitor view in Director, the Performance Analyzer in Designer, and/or the DataStage Operations Console.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not confused... which is why I specifically noted I was making a general statement. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Conflated, then.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's it, conflated. Definitely... conflated. Yah.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Right now I'm seeing lots of poorly designed jobs with multiple connected transformers:

transformer -> transformer -> transformer

Usually this is because the developer had a poor understanding of what could be done in a single transformer. Much of this is related to them not understanding the execution order that occurs in a transformer (which got a bit more complex with transformer looping).

So far I haven't encountered a single case that couldn't be consolidated into one transformer. Not only does it simplify the job, it also improves performance. As Ray said, most of the performance improvement comes from the fact that you are reducing resource usage (processes, memory) and eliminating several in-memory transfers of data.

Worst one I ever encountered was a job with a mess of about eight transformer stages connected with copy stages and funnels. The whole job was so bad I just trashed it and re-wrote it. Only required one transformer in the end and it ran orders of magnitude faster.

Note: conflation++

:P
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Thanks for all valuable replies.

Ray, Can you tell how the problem with one transformer in a job is resolved in newer versions ?
Also, in my original question I stated about filter stage and transformer stage comparision. If I use transformer stage to just filter records, Is it going to affect the performance as it's not native to DataStage ?
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Waaaay back (like in 6.x), Transformer Stages weren't as efficient in generating C++ code directly, or at all [memory is fading over time ; ) ]. That was a big part of it. ...at that time, using a Modify was probably the best way to go. That is OLD history. Great examples above about why you don't want a "ton" of transformers, but the pure idea of "using a Transformer" being a problem is long gone. The overhead is mostly just lots of stages, etc. as noted in the excellent points already made.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kaps wrote: If I use transformer stage to just filter records, Is it going to affect the performance as it's not native to DataStage ?
On the other hand, the transform operator is a directly compiled component, whereas the filter operator is more like interpreted (not strictly correct, as it uses a pre-built object for its actual work).

May I suggest that you build two jobs to compare the performance, and make sure that you use a statistically significant volume of data?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Not to confound the matter, but the Transformer stage has been native to DataStage since day 1; just not native to the Orchestrate operators. Confuted?
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Confuted indeed.

Parallel jobs use the transform operator - the parallel Transformer stage is merely a convenient (= GUI) way of setting it up.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Ray

I have tested the performance as you suggested and did not find much difference between the two stages. Upto 10 million records with 4 small columns the time taken is same between them and when I made it 100 million records job with trasformer stage actually finished 2 sces earlier than the job with Filter stage.

Basically job design is Row Generator to Transformer(or)Filter to sequential file.

So, can we conclude that use of filter stage insted of transforer stage does not improve the performance or the use of transformer stage does not inversly affect the performance.

Thanks
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Read the red book, its already mentioned that transformer use is suggested instead of filter and switch if it can be used.

Ray and others already mentioned that they are not much of a overhead now a days. Discussion went from there to using to many stages in general and then to History of DataStage.

So I do not understand your point your are trying to make here. Am I missing anything?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply