Lookup character strings

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Mar 09, 2009 7:56 am

I think earlier I did not understand your requirement if full.

If you need to identify and split into different sets, maybe you can create a temporary pattern file copy as

Code: Select all

sed -e 's/^/\^/' yourPatternFile > tempPatternFile
sed -e 's/^/ /' -e 's/$/ /' yourPatternFile >> tempPatternFile
sed -e 's/$/ /' yourPatternFile >> tempPatternFile

And then do

Code: Select all

egrep -f tempPatternFile yourSourceDataFile

Alternatively you can split the incoming source data rows by replacing all "blank" with "blank + newline + blank" and also add @INROWNUM & trim(blank-de-blank) column to it.

Thus the line

This is a test line

will become

1 Thisb This
1 bisb is
1 bab a
1 btestb test
1 bline line

where b is blank character

You can then do you pattern matching using lookup with the third column.

This way, you get the line number which relates to the row, the original data as it appears in the row and the trimmed data to match. This will give you all analysis.

just4geeks · Post by **just4geeks** » Mon Mar 09, 2009 7:58 am

s_boyapati wrote:In DataStage lookup stage........
Another way is after lookup,.......

Thanks for your suggestions. But how will the lookup stage pull out these records in the first place? For example, if the field on the input stream is
'adfa ad are ab adr' and the lookup string is 'ab', the lookup stage will not pull it out since it doesn't match perfectly. Further, I cannot split the field on the input stream into sub-fields as the number of white spaces vary between records.

s_boyapati · Post by **s_boyapati** » Mon Mar 09, 2009 8:01 am

just4geeks wrote:I have a fixed width flat file containing a field with character data type, such as,
Code: Select all
ab dfa adfa
adie ab dsf
dfe adf abe
I have a lookup flat file which contains just one character field, such as,
Code: Select all
ab
do
so
I need to pull out those records which contain strings from lookup file. And I need to be able to specify if it can be present at the start, in the middle, or at the end. For example, consider the string, 'ab'. If I need to pull out records where ab is present at the start, I now do
Code: Select all
grep '^ab' <file>
. If ab is to be present in the middle, I now do
Code: Select all
grep ' ab ' <file>
. These grep commands have worked in the past because the lookup file was small. Now it was grown to over 18000 records.

How do I implement this in DataStage? The existing lookup stage only looks for exact matches of the lookup field value. But that will not get my expected results.

Any ideas will be appreciated.

In DataStage lookup stage you can do equijoin only, Use EXEC in After job routine, that should call shell script with file name as parameter to shell script. In Shell script use AWK or PERL skills to use first field ( Search key) to be searched in second fileld(Input record). Use PERL instead of AWK, if you like and comfortability.

Another way (Probably I am not sure about, depends on reqs ), use transformer where use Substring and index functions to find where the matching field located in string. You must use staging variables to get accurate results.

Hope this helps...

DSXchange

Lookup character strings

Re: Lookup character strings