Issue with picking the right record - sort output

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vimali balakrishnan
Participant
Posts: 60
Joined: Mon Dec 27, 2004 3:38 am

Issue with picking the right record - sort output

Post by vimali balakrishnan »

Hi,

I have the following requirement to be implemented in the job:

There are 4 - 5 fields in a file, out of this there are two fields based on which the record can be treated as a duplicate. For example,
Scenario 1:
F1,F2,F3,F4,F5
100,200,300,Y,-
100,200,300,Y,99
100,200,300,N,81
100,200,300,Y,81.

In this example, the record with a non blank value for field 5(F5) and with a value 'Y' for field 4(F4) needs to be passed to the target. In the example above, the ones with non-blank values for Y have to be passed to the target.
But there are few more scenarios that need to be handled in addition to this.

for example,
Scenario 2:
F1,F2,F3,F4,F5
100,200,300,Y,-
100,200,300,N,- - In this case the record with 'Y' needs to be passed.

Scenario 3:
F1,F2,F3,F4,F5
100,200,300,N,-
100,200,300,N,81
100,200,300,N,91 - In this case the ones with non-blank value needs to be passed.

other than this, if there is a distinct record then it has to be passed as is.

I have come up with the following approach,
I have two sort stages: In the first sort stage i have all fields except F4,F5.
sorted in ascending and then enabled the keychange to True.

In the second sort stage I have sorted based on the Keychange value.

With the help of this key change value i identify the duplicates. The unique record i pass it to a seperate link.
For the duplicate records i have a remove duplicate stage, where in based on all fields (F1 to F4) i'm retaining the last record so that the 'Y' record is picked if there are Y and N records. But there is a hickup here, for the scenarios 1,3 that i have mentioned above, this logic will result in only one record whereas 2 Y/N records with non-blank needs to be passed.

Can anyone suggest any alternate solution which would work for all scenarios?

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think you're reading too much into this. A Filter stage or a Transformer stage constraint would do the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply