I've imported data from a flat file into a dataset with a compound key based upon 5 columns, which can contain duplicates.
I've added a column which I plan to use to uniquely identify each of the duplicates, with values 1,2,3...
Can anyone tell me how to generate the numeric value in column 6 within DataStage, please?
Thanks
John
Generate numeric duplicate number column
Moderators: chulett, rschirm, roy
You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
ray.wurlod
- Participant
- Posts: 54595
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
DSguru2B wrote:You can do this with stage variables. Use hash partition to ensure that similar keys end up in one partition. Use the stage variables to compare present row with previous row and a increment another stage variable which will be used as your counter. Reset it back if there is a change.
You need to take into account the partition number and number of partitions. If you dont want to go through all that. Run in sequential mode.
Sort your incoming data on your composite key.
Can I do this with a PX transformer stage? Someone suggested using the function RowProcGetPreviousValue() but I can only see this in the Server job transformer. How can I do the equivalent in PX?
Thanks for your help.
John
-
ray.wurlod
- Participant
- Posts: 54595
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Sort stage is far easier. It generates the "key is changed" value for you. All you have to do is to provide a column into which it can put this value. Any size of integer will do; the value is 0 or 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Alas, Ray, my company is too tight to pay for a premium membership to see most of your posting
But the stage variables alternative was straightforward enough. I put a zero in the extra column by default and increment this value each time the other key columns are repeated. Easy!
Thank you both very much for your help.
Regards
JohnL
But the stage variables alternative was straightforward enough. I put a zero in the extra column by default and increment this value each time the other key columns are repeated. Easy!
Thank you both very much for your help.
Regards
JohnL
