Number of transformer stages affects performance ?
Moderators: chulett, rschirm, roy
Number of transformer stages affects performance ?
First, I read somewhere that if we you use too many transformer stages in Parallel jobs it affects the performance because it needs to get compiled in c++ but later remember reading that it does not affect performance in newer versions. Can someone tell me if that's true ? Why was it not a problem now ? Basically, we can use filter stage or transform stage to filter rows but does the tranform stage affects the performance in this case ?
Thanks
Thanks
However, as a general statement while transformers were a 'performance issue' in the beginning, that hasn't been true for quite some time. So simply using transformers in a job should no longer be a concern but best to use them when actually needed - if the work can be done by something native to the framework like a Filter or Modify stage, use those instead.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Two issues are being confused here. One is the performance of ONE Transformer stage, which was a problem in earlier versions but isn't any more. The other is the performance of TOO MANY stages (of whatever type). Every stage (ignoring operator combination) will generate an additional process; too many processes will overload your server. How many is too many depends on what they are doing; you can monitor their resource consumption with various tools, including Monitor view in Director, the Performance Analyzer in Designer, and/or the DataStage Operations Console.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Right now I'm seeing lots of poorly designed jobs with multiple connected transformers:
transformer -> transformer -> transformer
Usually this is because the developer had a poor understanding of what could be done in a single transformer. Much of this is related to them not understanding the execution order that occurs in a transformer (which got a bit more complex with transformer looping).
So far I haven't encountered a single case that couldn't be consolidated into one transformer. Not only does it simplify the job, it also improves performance. As Ray said, most of the performance improvement comes from the fact that you are reducing resource usage (processes, memory) and eliminating several in-memory transfers of data.
Worst one I ever encountered was a job with a mess of about eight transformer stages connected with copy stages and funnels. The whole job was so bad I just trashed it and re-wrote it. Only required one transformer in the end and it ran orders of magnitude faster.
Note: conflation++
transformer -> transformer -> transformer
Usually this is because the developer had a poor understanding of what could be done in a single transformer. Much of this is related to them not understanding the execution order that occurs in a transformer (which got a bit more complex with transformer looping).
So far I haven't encountered a single case that couldn't be consolidated into one transformer. Not only does it simplify the job, it also improves performance. As Ray said, most of the performance improvement comes from the fact that you are reducing resource usage (processes, memory) and eliminating several in-memory transfers of data.
Worst one I ever encountered was a job with a mess of about eight transformer stages connected with copy stages and funnels. The whole job was so bad I just trashed it and re-wrote it. Only required one transformer in the end and it ran orders of magnitude faster.
Note: conflation++
Thanks for all valuable replies.
Ray, Can you tell how the problem with one transformer in a job is resolved in newer versions ?
Also, in my original question I stated about filter stage and transformer stage comparision. If I use transformer stage to just filter records, Is it going to affect the performance as it's not native to DataStage ?
Ray, Can you tell how the problem with one transformer in a job is resolved in newer versions ?
Also, in my original question I stated about filter stage and transformer stage comparision. If I use transformer stage to just filter records, Is it going to affect the performance as it's not native to DataStage ?
Waaaay back (like in 6.x), Transformer Stages weren't as efficient in generating C++ code directly, or at all [memory is fading over time ; ) ]. That was a big part of it. ...at that time, using a Modify was probably the best way to go. That is OLD history. Great examples above about why you don't want a "ton" of transformers, but the pure idea of "using a Transformer" being a problem is long gone. The overhead is mostly just lots of stages, etc. as noted in the excellent points already made.
Ernie
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
On the other hand, the transform operator is a directly compiled component, whereas the filter operator is more like interpreted (not strictly correct, as it uses a pre-built object for its actual work).kaps wrote: If I use transformer stage to just filter records, Is it going to affect the performance as it's not native to DataStage ?
May I suggest that you build two jobs to compare the performance, and make sure that you use a statistically significant volume of data?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ray
I have tested the performance as you suggested and did not find much difference between the two stages. Upto 10 million records with 4 small columns the time taken is same between them and when I made it 100 million records job with trasformer stage actually finished 2 sces earlier than the job with Filter stage.
Basically job design is Row Generator to Transformer(or)Filter to sequential file.
So, can we conclude that use of filter stage insted of transforer stage does not improve the performance or the use of transformer stage does not inversly affect the performance.
Thanks
I have tested the performance as you suggested and did not find much difference between the two stages. Upto 10 million records with 4 small columns the time taken is same between them and when I made it 100 million records job with trasformer stage actually finished 2 sces earlier than the job with Filter stage.
Basically job design is Row Generator to Transformer(or)Filter to sequential file.
So, can we conclude that use of filter stage insted of transforer stage does not improve the performance or the use of transformer stage does not inversly affect the performance.
Thanks
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Read the red book, its already mentioned that transformer use is suggested instead of filter and switch if it can be used.
Ray and others already mentioned that they are not much of a overhead now a days. Discussion went from there to using to many stages in general and then to History of DataStage.
So I do not understand your point your are trying to make here. Am I missing anything?
Ray and others already mentioned that they are not much of a overhead now a days. Discussion went from there to using to many stages in general and then to History of DataStage.
So I do not understand your point your are trying to make here. Am I missing anything?
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.