Difference/Advantage of using Transform instead of Routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
willpeng
Participant
Posts: 18
Joined: Wed Apr 07, 2004 9:24 pm
Location: Middletown, NJ

Difference/Advantage of using Transform instead of Routine

Post by willpeng »

Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?

My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?

Willy
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Don't really have time for the Full Wurlod, but in a nutshell...

A routine is called by the job when it runs, so the disadvantage is the overhead of the context switching and the passing of the arguments back and forth. This can tend to become significant when dealing with large data volumes. The advantage is any changes made to a routine are automatically picked up by any job that uses them the next time they run.

A transform is a piece of code that is substituted in by the compiler at compile time, so there is no 'overhead' associated with it. The downside is you have to recompile any jobs that use those transforms before they would see the change.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
As part of loading the routine, stack entries must be constructed for normal and error return, as well as for argument passing. Later, these stack entries must be deallocated.
Even so, to comparison is between some overhead and none.

A Transform is a single expression, stored with associated documentation (and references to data elements, if you use these) in the Repository. When a job that uses a Transform is compiled, the Transform's defining expression is copied into the job as in-line code.
Some Transforms call Routines; in this case that advantage is lost. The advantage now is, for example, having something meaningfully named that a developer can use (such as DIGITS) rather than requiring the developer to learn the arcane underlying function.

There is a limit to what can be done in a single expression, and some things (such as searching a dynamic array) simply cannot be done in expressions, because they require statements. In this case, you are forced into using a Routine, since it's the only way to achieve the task.

Routines that are transform functions are called for every row processed, so should be as lightweight and efficient as possible. Before/after subroutines, on the other hand, are executed only once per job run, so can be quite heavy duty.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
Ah... thanks for the clarification. It was explained to me once that way, and I've been merrily passing it along ever since. :? Silly Wabbit.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ah the Full Wurlod. What would life be without a Full Wurlod once in a while.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hear Hear! :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
willpeng
Participant
Posts: 18
Joined: Wed Apr 07, 2004 9:24 pm
Location: Middletown, NJ

Post by willpeng »

Thanks!!! It helps.

So I guess trying to make routine into transform does not increase performance huh?

So is there a joke here that I didn't get???
William Peng
DW/ETL Consultant
Middletown, NJ
willpeng
Participant
Posts: 18
Joined: Wed Apr 07, 2004 9:24 pm
Location: Middletown, NJ

Post by willpeng »

What about a Stage Variable? Any advantage in using it instead of Transform or Routine? It looks and smells like a transform specific for the job.
William Peng
DW/ETL Consultant
Middletown, NJ
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Stage Variables are very handy and can be used to simplify things. They are evaluated (in order) before the derivations in your output links, so they can be used to cut down on the amount of work done in the Transformer.

For example, a complex derivation that is used in multiple output links can be put in a Stage Variable, evaluated once and then simply referenced in each output link.

They can also make constraints and other derivations easier to understand when setup as boolean values. Another 'for example', setup one called 'NewRecord' using the derivation needed to determine if a record is new and set its value to TRUE or FALSE. Then simply refer to it later - "If NewRecord Then ... Else ...".

They are also about the only way, when working with repeating groups, to capture previous values and then compare them to current values in a Server job. Well, there is COMMON storage but a Stage Variable is a better answer nowadays.
-craig

"You can never have too many knives" -- Logan Nine Fingers
alexysflores
Participant
Posts: 18
Joined: Mon Jan 12, 2004 7:20 am
Location: USA

Re: Difference/Advantage of using Transform instead of Routi

Post by alexysflores »

[quote="willpeng"]Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?

My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?

Willy[/quote]

I would advise you not to use both DS Transform and Routine if you have over million rows of transaction there are degradation in performance. Coz its still BASIC - toooooo sloooooow
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What's your alternative in server jobs? :?

I challenge you to do anything faster in server jobs that what can be done with BASIC expressions/routines.

Note that I didn't specify "what you can do with BASIC" - I specified "what can be done with BASIC".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
willpeng
Participant
Posts: 18
Joined: Wed Apr 07, 2004 9:24 pm
Location: Middletown, NJ

Post by willpeng »

If not transform or routine for an over million row in DS, what else?

Is there something option or way to able to call a rountine or function without it being called and cleaned up for each row?

I can probably code a routine that actually process each row within that routinue, but that defeats the purpose of the rapid development and GUI in DS.
William Peng
DW/ETL Consultant
Middletown, NJ
jwhyman
Premium Member
Premium Member
Posts: 13
Joined: Fri Apr 09, 2004 2:18 am

Post by jwhyman »

You use a routine for logic that cannot be represented in a transform. A transfrorm is an expression, whereas routines can contain statements. I am surprised that you say that Basic is so slow, some of the functionality beneath statemnts is very complex. It is yes slower than maybe taking the time to re write your custoem logic in C/C++. You can do this in EE (PX). You can actually do it in Server using DSCAPI ad write your own stage. If you are replicating existing functionliy, it will be , most likely, slower.

A million rows is not large, and not many people run their jobs on toaters thee days, performnce is about perception, expectation and need.
Post Reply