Column generators or transformers?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
midmaxds
Premium Member
Premium Member
Posts: 71
Joined: Mon Oct 26, 2015 11:44 am

Column generators or transformers?

Post by midmaxds »

Hi,

We are planning to handle rejection reason from a filter using column generator.
Say if we have some records failing a filter condition, those records will be passed via a column generator and a respective reason will be added in a new column, so that we can identify for what reason it failed.

Can we use a transformer instead of column generator? I belive both serve the same purpose, but from a performance point of view which is better?
We will be dealing with 7 million records at the max. Kindly let me know your thoughts.
Midhun
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Transformers are capable and efficient stages to use and this type of processing can be done in a transform stage.

Are you sure you want to use a column generator or did you mean a row generator?
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Depending on what you are doing, you can actually set up a rules stage and (unfortunately due to the tool limitations) you can join a couple of times to produce the original row concatenated with the rule results which tell you which rule failed in additional columns. This seems to be very efficient for more complicated sets of rules (if the data can fail for a wide variety of reasons and conditions).

If the rule is simple, its going to be hard to beat a transformer condition that appends a column. The column is redundant, though: if it meets the failed condition (which might be "reject" making it the "not" of the other constraint in your xformer) you know already why it was rejected and may not actually NEED the column for the "why"?
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You should also research use of the Exceptions stage and the Data Quality console - you may be able to save yourself a lot of work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
midmaxds
Premium Member
Premium Member
Posts: 71
Joined: Mon Oct 26, 2015 11:44 am

Post by midmaxds »

@ ArndW -- Thanks for the response. Yes, we use a column generator, in which we hardcode the reason for reject to the newly added column.
We used to have transformers in previous projects, where the data volume was less, but looks like they degrade performance for huge volumes. I could be wrong too.

@UCDI - Thank you. Does this work with look up failures, filter etc? Or is it only if we use a data rule defition (i.e a rule published in Information analyzer) ?

@ - ray.wurlod - Thank you. Yes, we are using them as well.
All the rejections, we are putting through a funnel --> Funnel to work table --> and finally sending to Exception Stage.

Basically, we are capturing all sort of rejections and pushing it to the exceptions stage to have them available in console.
Midhun
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

My suggestion is indeed just for IA type rules or rule sets. While you could force-fit this to work on lookup fails (at least some types of lookup fails) it would not be efficient for that task.

Also, transformers are pretty efficient depending on what you ask it to do and how you ask it to do it. Appending a column and poking a value in there is pretty quick. Ive had simple transformers that punch well over 150k rows/sec which is ~identical to the rate the data comes in from the database. Ive had others that barely push 1/10 of that due to complex logic or poor approaches. Its not the transformer stage in and of itself (there are rumors that older versions of datastage had slower transformers, I am talking 11.x here). That said, other simpler stages that can do the same work may be "technically" more efficient (but unable to reap much gains because once your stage is equal to the speed of the input stream, you can't go any faster).

What you are asking sounds simple. There should be a very efficient way to do it.
midmaxds
Premium Member
Premium Member
Posts: 71
Joined: Mon Oct 26, 2015 11:44 am

Post by midmaxds »

Thank you for all the details..will try the transformer approach and also compare with column generator to see how much performance gain can be achieved.
Midhun
Post Reply