duplicate and non duplicate data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
agpt
Participant
Posts: 151
Joined: Sun May 16, 2010 12:53 am

duplicate and non duplicate data

Post by agpt »

Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: duplicate and non duplicate data

Post by gateleys »

agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
1. Sort values in the input column
2. Pass the output of step 1 into a transformer wherein you create 2 stage variables with the following derivations-

Code: Select all

svDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @TRUE Else @FALSE

svNonDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @FALSE Else @TRUE
3. Use 2 output links, one to pass duplicate rows and other for non-duplicates, using the constraints-
for dulicates-

Code: Select all

svDuplicates
for non-dulicates-

Code: Select all

svNonDuplicates
I hope it works in a parallel job...and it should IF you use a BASIC transofrmer.
gateleys
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Re: duplicate and non duplicate data

Post by anbu »

agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
Do you have 8 records or 8 columns in a row?
You are the creator of your destiny - Swami Vivekananda
agpt
Participant
Posts: 151
Joined: Sun May 16, 2010 12:53 am

Re: duplicate and non duplicate data

Post by agpt »

anbu wrote:
agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
Do you have 8 records or 8 columns in a row?
8 records
agpt
Participant
Posts: 151
Joined: Sun May 16, 2010 12:53 am

Post by agpt »

I went through the other posts in the forum... and got the solution - using copy, aggregator , filter then join back to get duplicates out....

Thanks to all of you!!!!
Post Reply