Hi,
I have the following requirement to be implemented in the job:
There are 4 - 5 fields in a file, out of this there are two fields based on which the record can be treated as a duplicate. For example,
Scenario 1:
F1,F2,F3,F4,F5
100,200,300,Y,-
100,200,300,Y,99
100,200,300,N,81
100,200,300,Y,81.
In this example, the record with a non blank value for field 5(F5) and with a value 'Y' for field 4(F4) needs to be passed to the target. In the example above, the ones with non-blank values for Y have to be passed to the target.
But there are few more scenarios that need to be handled in addition to this.
for example,
Scenario 2:
F1,F2,F3,F4,F5
100,200,300,Y,-
100,200,300,N,- - In this case the record with 'Y' needs to be passed.
Scenario 3:
F1,F2,F3,F4,F5
100,200,300,N,-
100,200,300,N,81
100,200,300,N,91 - In this case the ones with non-blank value needs to be passed.
other than this, if there is a distinct record then it has to be passed as is.
I have come up with the following approach,
I have two sort stages: In the first sort stage i have all fields except F4,F5.
sorted in ascending and then enabled the keychange to True.
In the second sort stage I have sorted based on the Keychange value.
With the help of this key change value i identify the duplicates. The unique record i pass it to a seperate link.
For the duplicate records i have a remove duplicate stage, where in based on all fields (F1 to F4) i'm retaining the last record so that the 'Y' record is picked if there are Y and N records. But there is a hickup here, for the scenarios 1,3 that i have mentioned above, this logic will result in only one record whereas 2 Y/N records with non-blank needs to be passed.
Can anyone suggest any alternate solution which would work for all scenarios?
Thanks.
Issue with picking the right record - sort output
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 60
- Joined: Mon Dec 27, 2004 3:38 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: