Sort with remove Dups

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
edward_m
Charter Member
Charter Member
Posts: 257
Joined: Fri Jun 24, 2005 9:34 am
Location: Philadelphia,PA

Sort with remove Dups

Post by edward_m »

To get the latest row based on the key columns i am using sort with remove duplicates stage, however its working for some records but not all, also seeing dups.

Input--->Sort--->remove Dups--->Output

Here is the data..

COL1 COL2 COL3 COL4 COL5
1 02/01/2015 02/28/2015 02/18/2015 01
1 03/01/2015 12/31/9999 02/18/2015 02
2 01/01/2015 02/28/2015 02/18/2015 01
3 01/01/2015 01/30/2015 02/18/2015 01
3 02/01/2015 12/31/9999 02/18/2015 02
3 03/01/2015 12/31/2015 02/18/2015 03

expected Output..
1 03/01/2015 12/31/9999 02/18/2015 02
2 01/01/2015 02/28/2015 02/18/2015 01
3 03/01/2015 12/31/2015 02/18/2015 03

In sort stage..partitiong set to Auto
Sorting Keys
key=COL1
Sort Key Mode=sort
Sort Order=Ascending
key=COL4
Sort Key Mode=sort
Sort Order=Ascending
key=COL2
Sort Key Mode=sort
Sort Order=Descending
key=COL4
Sort Key Mode=sort
Sort Order=Descending

In Remove Dups stage..partitiong set to Auto

Keys that define duplicates
Key=COl1
Duplicates to retain=First

I am not sure where i was wrong..please point me in right direction.

Thanks in advance.
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

What is "SQL Type" of COL1 thru COL5?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Seems to me that you want Duplicates to Retain=Last.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
edward_m
Charter Member
Charter Member
Posts: 257
Joined: Fri Jun 24, 2005 9:34 am
Location: Philadelphia,PA

Post by edward_m »

its working for some records not all, do i need to change anyhting with partitioning property as job runs with 4 node config file..

Thanks.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

You have to make sure that you are selecting partitioning and sorting for COL1 and only sorting for the remaining columns.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

and use only Input--->Remove Dups--->Output.
Use the input column sorting/partitioning in Remove Duplicate stage. No need to use Sort Stage for your requirement.
Post Reply