Generate numeric duplicate number column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jlitherl
Participant
Posts: 6
Joined: Wed Jun 21, 2006 2:15 am
Location: London

Generate numeric duplicate number column

Post by jlitherl »

I've imported data from a flat file into a dataset with a compound key based upon 5 columns, which can contain duplicates.

I've added a column which I plan to use to uniquely identify each of the duplicates, with values 1,2,3...

Can anyone tell me how to generate the numeric value in column 6 within DataStage, please?

Thanks

John
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you use a Sort stage, it can generate a key change column. If this is 1, you have a new key; if this is 0 you have a duplicate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jlitherl
Participant
Posts: 6
Joined: Wed Jun 21, 2006 2:15 am
Location: London

Post by jlitherl »

DSguru2B wrote:You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.


Can I do this with a PX transformer stage? Someone suggested using the function RowProcGetPreviousValue() but I can only see this in the Server job transformer. How can I do the equivalent in PX?

Thanks for your help.

John
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

With stage variables. Search this forum for more details on how to compare present row with previous using stage variables.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Sort stage is far easier. It generates the "key is changed" value for you. All you have to do is to provide a column into which it can put this value. Any size of integer will do; the value is 0 or 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jlitherl
Participant
Posts: 6
Joined: Wed Jun 21, 2006 2:15 am
Location: London

Post by jlitherl »

Alas, Ray, my company is too tight to pay for a premium membership to see most of your posting :(

But the stage variables alternative was straightforward enough. I put a zero in the extra column by default and increment this value each time the other key columns are repeated. Easy!

Thank you both very much for your help.

Regards

JohnL
Post Reply