duplicate and non duplicate data

agpt · Post by **agpt** » Fri Sep 10, 2010 1:34 pm

Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?

gateleys · Post by **gateleys** » Fri Sep 10, 2010 2:03 pm

agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?

1. Sort values in the input column
2. Pass the output of step 1 into a transformer wherein you create 2 stage variables with the following derivations-

Code: Select all

svDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @TRUE Else @FALSE

svNonDuplicates = If RowProcCompareWithPreviousValue(InputLink.Column) Then @FALSE Else @TRUE

3. Use 2 output links, one to pass duplicate rows and other for non-duplicates, using the constraints-
for dulicates-

Code: Select all

svDuplicates

for non-dulicates-

Code: Select all

svNonDuplicates

I hope it works in a parallel job...and it should IF you use a BASIC transofrmer.

anbu · Post by **anbu** » Fri Sep 10, 2010 2:09 pm

agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?

Do you have 8 records or 8 columns in a row?

agpt · Post by **agpt** » Sat Sep 11, 2010 12:28 am

anbu wrote:
agpt wrote:Hi,
A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How can I do it?
Do you have 8 records or 8 columns in a row?

8 records

agpt · Post by **agpt** » Sun Sep 12, 2010 2:41 am

I went through the other posts in the forum... and got the solution - using copy, aggregator , filter then join back to get duplicates out....

Thanks to all of you!!!!

DSXchange

duplicate and non duplicate data

duplicate and non duplicate data

Re: duplicate and non duplicate data

Re: duplicate and non duplicate data

Re: duplicate and non duplicate data