Hi,
We are planning to handle rejection reason from a filter using column generator.
Say if we have some records failing a filter condition, those records will be passed via a column generator and a respective reason will be added in a new column, so that we can identify for what reason it failed.
Can we use a transformer instead of column generator? I belive both serve the same purpose, but from a performance point of view which is better?
We will be dealing with 7 million records at the max. Kindly let me know your thoughts.
Column generators or transformers?
Moderators: chulett, rschirm, roy
Depending on what you are doing, you can actually set up a rules stage and (unfortunately due to the tool limitations) you can join a couple of times to produce the original row concatenated with the rule results which tell you which rule failed in additional columns. This seems to be very efficient for more complicated sets of rules (if the data can fail for a wide variety of reasons and conditions).
If the rule is simple, its going to be hard to beat a transformer condition that appends a column. The column is redundant, though: if it meets the failed condition (which might be "reject" making it the "not" of the other constraint in your xformer) you know already why it was rejected and may not actually NEED the column for the "why"?
If the rule is simple, its going to be hard to beat a transformer condition that appends a column. The column is redundant, though: if it meets the failed condition (which might be "reject" making it the "not" of the other constraint in your xformer) you know already why it was rejected and may not actually NEED the column for the "why"?
-
ray.wurlod
- Participant
- Posts: 54595
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
@ ArndW -- Thanks for the response. Yes, we use a column generator, in which we hardcode the reason for reject to the newly added column.
We used to have transformers in previous projects, where the data volume was less, but looks like they degrade performance for huge volumes. I could be wrong too.
@UCDI - Thank you. Does this work with look up failures, filter etc? Or is it only if we use a data rule defition (i.e a rule published in Information analyzer) ?
@ - ray.wurlod - Thank you. Yes, we are using them as well.
All the rejections, we are putting through a funnel --> Funnel to work table --> and finally sending to Exception Stage.
Basically, we are capturing all sort of rejections and pushing it to the exceptions stage to have them available in console.
We used to have transformers in previous projects, where the data volume was less, but looks like they degrade performance for huge volumes. I could be wrong too.
@UCDI - Thank you. Does this work with look up failures, filter etc? Or is it only if we use a data rule defition (i.e a rule published in Information analyzer) ?
@ - ray.wurlod - Thank you. Yes, we are using them as well.
All the rejections, we are putting through a funnel --> Funnel to work table --> and finally sending to Exception Stage.
Basically, we are capturing all sort of rejections and pushing it to the exceptions stage to have them available in console.
Midhun
My suggestion is indeed just for IA type rules or rule sets. While you could force-fit this to work on lookup fails (at least some types of lookup fails) it would not be efficient for that task.
Also, transformers are pretty efficient depending on what you ask it to do and how you ask it to do it. Appending a column and poking a value in there is pretty quick. Ive had simple transformers that punch well over 150k rows/sec which is ~identical to the rate the data comes in from the database. Ive had others that barely push 1/10 of that due to complex logic or poor approaches. Its not the transformer stage in and of itself (there are rumors that older versions of datastage had slower transformers, I am talking 11.x here). That said, other simpler stages that can do the same work may be "technically" more efficient (but unable to reap much gains because once your stage is equal to the speed of the input stream, you can't go any faster).
What you are asking sounds simple. There should be a very efficient way to do it.
Also, transformers are pretty efficient depending on what you ask it to do and how you ask it to do it. Appending a column and poking a value in there is pretty quick. Ive had simple transformers that punch well over 150k rows/sec which is ~identical to the rate the data comes in from the database. Ive had others that barely push 1/10 of that due to complex logic or poor approaches. Its not the transformer stage in and of itself (there are rumors that older versions of datastage had slower transformers, I am talking 11.x here). That said, other simpler stages that can do the same work may be "technically" more efficient (but unable to reap much gains because once your stage is equal to the speed of the input stream, you can't go any faster).
What you are asking sounds simple. There should be a very efficient way to do it.

</a>