Remove Duplicate Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
reddygs
Participant
Posts: 1
Joined: Wed Feb 06, 2008 12:56 am

Remove Duplicate Stage

Post by reddygs »

Hai,


In my data source (oracle)there are some duplicate and unique values i want to populate duplicate values in one target and unique values in another target can it be done in PX by using any stage or is it possible by using REMOVE DUPLICATE STAGE ?
can any one help me to solve this problem.
Regards
reddygs
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

Hello!

WELCOME!!

:idea: You can achieve this by using Aggregator + Filter stages. :wink:

source ----> aggregator ( on Key Columns + Aggregation Type--Count Rows ) ---> filter (there are two links)
( count = 1 )(in the first link) |---> (target) Unique Values
|------->(target) Duplicate Values....
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Do you want 1 record per key set to go to the database and the remaining dups to another dataset? Or only sent a record if its key set is not duplicated anywhere, and all key sets that have dups are rejected to the other dataset?

Ex. Keys are 1, 2, 3, 2, 4, 5, 1, 6. Do you want 1,2,3,4,5,6 to go one way and the extra 1,2 to go to the dups dataset or do you want 3,4,5,6 to go one way and all occurences of 1,2 to go the other way?

Brad.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

With the aggregator, won't that reduce your result set? Seems like you want to have 2 streams of data - 1 with your main data and another with just keys. The key stream goes through the aggregator and filters like AmeyJoshi14 mentions. But then you need to join your 2 streams back together with the data stream:

Code: Select all

                 (datastream - whatever transformation needs to happen...) -- (to joins below)...
file -- copy --<
                 modify (keep keys only) -> aggregator -> filter -> (uniq) -- Inner join to data -> output1
                                                                   \
                                                                    (dups) -- Inner join to data -> output2
Sorry, the diagram is crude. Hopefully it makes sense... Please correct me if I am wrong about the aggregator, I don't use it much.

Brad.
Post Reply