Page 1 of 1

Remove Duplicate Stage

Posted: Wed Feb 06, 2008 1:37 am
by reddygs
Hai,


In my data source (oracle)there are some duplicate and unique values i want to populate duplicate values in one target and unique values in another target can it be done in PX by using any stage or is it possible by using REMOVE DUPLICATE STAGE ?
can any one help me to solve this problem.
Regards
reddygs

Posted: Wed Feb 06, 2008 2:02 am
by AmeyJoshi14
Hello!

WELCOME!!

:idea: You can achieve this by using Aggregator + Filter stages. :wink:

source ----> aggregator ( on Key Columns + Aggregation Type--Count Rows ) ---> filter (there are two links)
( count = 1 )(in the first link) |---> (target) Unique Values
|------->(target) Duplicate Values....

Posted: Thu Feb 07, 2008 4:15 pm
by bcarlson
Do you want 1 record per key set to go to the database and the remaining dups to another dataset? Or only sent a record if its key set is not duplicated anywhere, and all key sets that have dups are rejected to the other dataset?

Ex. Keys are 1, 2, 3, 2, 4, 5, 1, 6. Do you want 1,2,3,4,5,6 to go one way and the extra 1,2 to go to the dups dataset or do you want 3,4,5,6 to go one way and all occurences of 1,2 to go the other way?

Brad.

Posted: Thu Feb 07, 2008 4:23 pm
by bcarlson
With the aggregator, won't that reduce your result set? Seems like you want to have 2 streams of data - 1 with your main data and another with just keys. The key stream goes through the aggregator and filters like AmeyJoshi14 mentions. But then you need to join your 2 streams back together with the data stream:

Code: Select all

                 (datastream - whatever transformation needs to happen...) -- (to joins below)...
file -- copy --<
                 modify (keep keys only) -> aggregator -> filter -> (uniq) -- Inner join to data -> output1
                                                                   \
                                                                    (dups) -- Inner join to data -> output2
Sorry, the diagram is crude. Hopefully it makes sense... Please correct me if I am wrong about the aggregator, I don't use it much.

Brad.