searching a value in input dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sourabhverma
Participant
Posts: 74
Joined: Thu Jan 05, 2006 2:07 am

searching a value in input dataset

Post by sourabhverma »

Hi All,

I have a input dataset which is having 3 columns col1,col2,col3.
Values are like
Col1 Col2 Col3
------- ----- -----
12345 ABC India
12567 XYZ US
23672 DEF UK

Also, there is one oracle table which is having only one column & having values like :
Col1
-----
23
99
36

My requirement is : i need to comapre the col1 value of oracle table with the col1 of input dataset & if it matches with col1 of datset i need to apply some transformations. Eg: 23 matches with 12345 , 36 matches with 23672.

Any help is appreciated.
Thanks,
Sourabh Verma
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

I guess range lookup might help you. i have never used it but you can do a range lookup in V8.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

Is Oracle.Col1 always going to match against characters 2 and 3 of dataset.col1, as your example suggests??

If this is the case, then use a transformer to create a new column based on characters 2 and 3, and perform an inner join between this new column and Oracle.Col1.

If the above is not the case, create a new column to hold a two digit value, and create records for each combination of the two characters from dataset.Col1 e.g.

Col1 Col2 Col3
------- ----- -----
12345 ABC India
12567 XYZ US
23672 DEF UK

Becomes

Col1 Col2 Col3 Col4
------- ----- ----- ----
12345 ABC India 12
12345 ABC India 23
12345 ABC India 34
12345 ABC India 45
12567 XYZ US 12
12567 XYZ US 25
12567 XYZ US 56
12567 XYZ US 67
23672 DEF UK 23
23672 DEF UK 36
23672 DEF UK 67
23672 DEF UK 72

You can use an External Filter command stage to perform an awk command to produce this type of transformation, unless there are always going to be 4 combinations, in which case you can use a transformer.
Now perform an inner join between new column Col4 and Oracle.Col1.
sourabhverma
Participant
Posts: 74
Joined: Thu Jan 05, 2006 2:07 am

Post by sourabhverma »

Thanks for the reply...

Sorry i did not provide more info but the problem is like, the input dataset is having 5-6 millions of records & field value can be of any length from 3 to 10. So in that case there will be many more combinations in the result.
Thanks,
Sourabh Verma
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

Then in that case you may need to consider creating many records for each combination, presumably between 2 and 9 records per input record.

Only keep the keys plus the new 'combination field', and join back to the rest of the data after you've joined to get your records to process.

If I presume dataset.Col1 is the only key, then...

The awk command could be something like....
awk '{for(nCombi=1; nCombi<length; nCombi++)print $0,substr($0,nCombi,2);}'

Add this to an external filter stage, specify input as a single column varchar(10) and output columns of varchar(10),char(2), format as space separated data.
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

Slight modification, after trying it...

Seems the external filter stage doesn't want to increase the number of output columns, therefore you'll need to have one output col big enough to hold max data e.g. '1234567890 12' i.e. varchar(13).

Then use a column import stage to split this single col into two cols.
Post Reply