Page 1 of 1

Generate numeric duplicate number column

Posted: Fri Feb 09, 2007 10:02 am
by jlitherl
I've imported data from a flat file into a dataset with a compound key based upon 5 columns, which can contain duplicates.

I've added a column which I plan to use to uniquely identify each of the duplicates, with values 1,2,3...

Can anyone tell me how to generate the numeric value in column 6 within DataStage, please?

Thanks

John

Posted: Fri Feb 09, 2007 10:09 am
by DSguru2B
You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.

Posted: Fri Feb 09, 2007 3:37 pm
by ray.wurlod
If you use a Sort stage, it can generate a key change column. If this is 1, you have a new key; if this is 0 you have a duplicate.

Posted: Tue Feb 13, 2007 12:00 pm
by jlitherl
DSguru2B wrote:You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.


Can I do this with a PX transformer stage? Someone suggested using the function RowProcGetPreviousValue() but I can only see this in the Server job transformer. How can I do the equivalent in PX?

Thanks for your help.

John

Posted: Tue Feb 13, 2007 12:03 pm
by DSguru2B
With stage variables. Search this forum for more details on how to compare present row with previous using stage variables.

Posted: Tue Feb 13, 2007 2:50 pm
by ray.wurlod
The Sort stage is far easier. It generates the "key is changed" value for you. All you have to do is to provide a column into which it can put this value. Any size of integer will do; the value is 0 or 1.

Posted: Thu Feb 15, 2007 11:51 am
by jlitherl
Alas, Ray, my company is too tight to pay for a premium membership to see most of your posting :(

But the stage variables alternative was straightforward enough. I put a zero in the extra column by default and increment this value each time the other key columns are repeated. Easy!

Thank you both very much for your help.

Regards

JohnL