Scenario: Identify duplicated in a column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nagadastagirireddy
Participant
Posts: 15
Joined: Fri Jan 22, 2010 4:35 am
Location: India

Scenario: Identify duplicated in a column

Post by nagadastagirireddy »

Hi,
I have a scenario in my project, in a column there are two values separated by comma as shown below.

Input
-----------
AUS,IND
AUS,PAK
IND,AUS
IND,PAK
PAK,AUS
PAK,IND

If you observe in the above data, there a combinations of "AUS,IND" and "IND,AUS". In this case I have to consider them as duplicates and load any one of them in table.
My output should look as below.

Expected Result
--------------------
AUS,IND
AUS,PAK
IND,PAK
PAK,AUS

Please help me with this scenario.

Many Thanks in advance.
Giri
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Transformer: split using Field and put into alphabetical order, eg: IND,AUS -> AUS,IND
Sort with key change
Filter by key change= 1
nagadastagirireddy
Participant
Posts: 15
Joined: Fri Jan 22, 2010 4:35 am
Location: India

Post by nagadastagirireddy »

Hi,
If I split and sort the data then it will look like below.

After applying split and sort
-------------------------------
AUS,AUS
AUS,AUS
IND,IND
IND,IND
PAK,PAK
PAK,PAK

This might not give expected result.
Giri
yugee
Participant
Posts: 34
Joined: Fri Feb 04, 2011 5:54 pm

Post by yugee »

Simple, but works as required.

Transformer:

COL1 --> is your input with two fields separated by ','

Declare Below Stage Variables:

svIndex - Index(DSLink1.COL1,',',1) This will find the position of ','
svLength - Len(DSLink1.COL1) this is to find the length of your column
svField1 - Left(DSLink1.COL1,svIndex-1) Left portion of your input column
svField2 - Right(DSLink1.COL1,svLength-svIndex) Right portion of your input


Declare an output column:
MERGEKEY -

If Compare(svField1,svField2) > 0 Then
svField1:svField2
Else
svField2:svField1

Finally Have a Sort Stage - to remove duplicates on MERGEKEY

Let me know if it solves your requirement.
nagadastagirireddy
Participant
Posts: 15
Joined: Fri Jan 22, 2010 4:35 am
Location: India

Post by nagadastagirireddy »

Thank a lot Yugee, that works perfect.

But one small check, we can directly apply field function to seperate the string, instead why are you using Index and field?
Giri
yugee
Participant
Posts: 34
Joined: Fri Feb 04, 2011 5:54 pm

Post by yugee »

Good to know...

No specific reason for using Index - I was testing this function as part of some other issue...you can use any other functions to split the field into two...
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Read the input twice. Number the rows @INROWNUM.
Join COL1 to COL2 and COL2 to COL1 in lookup stage.
So there will be 4 cols col1,col2,col3,col4.
AUS,IND,IND,AUS,1,3
IND,AUS,AUS,IND,3,1
Then add Num1+Num2
AUS,IND,IND,AUS,4
IND,AUS,AUS,IND,4
Then do remove duplicate on Num.
Post Reply