Page 1 of 2

extracting start and end position using index function

Posted: Tue May 05, 2009 8:06 am
by kirankota79
I have a sample data like this

"EMP_BASE_RATE=8.81","EMP_BIRTH_DATE=08/03/1987 00:00","EMP_FIRSTNAME=MANUEL","EMP_LASTNAME=BUENO","EMP_FULLNAME=BUENO, MANUEL","EMP_PAYGRP_NAME=PHASE1","EMP_SIN=removed","EMP_FLAG3=N","EMP_FLAG11=0","EMP_FLAG14=R","EMP_VAL1=H2SA002","EMP_VAL2=","EMP_VAL3=1982","EMP_VAL4=.25","EMP_VAL5=SA","EMP_VAL6=CA","EMP_VAL7=4175755","EMP_VAL8=IF0HO1","EMP_VAL17=CA","EMP_VAL20=G00S16"


From this i want to extract 3 values EMP_FIRSTNAME=MANUEL, EMP_LASTNAME=BUENO and EMP_FULLNAME=BUENO, MANUEL. I am able to get the start position for each of these using index function not able to get the last position. Can any suggest me please?

Posted: Tue May 05, 2009 8:12 am
by chulett
Use the Field function and tell it you have a comma delimited source.

Posted: Tue May 05, 2009 8:31 am
by kirankota79
The position of these name fields in the string is not same for each record they keep on varying like the following sample 3 records.

"EMP_FIRSTNAME=LATHEEF SYED","EMP_LASTNAME=HUSSAINY","EMP_FULLNAME=HUSSAINY, LATHEEF SYED"

"EMP_FIRSTNAME=CHISTINE","EMP_FULLNAME=BRENNECKE, CHISTINE"

"EMP_BASE_RATE=8.81","EMP_BIRTH_DATE=08/03/1987 00:00","EMP_FIRSTNAME=MANUEL","EMP_LASTNAME=BUENO","EMP_FULLNAME=BUENO, MANUEL","EMP_PAYGRP_NAME=PHASE1","EMP_SIN=removed","EMP_FLAG3=N","EMP_FLAG11=0","EMP_FLAG14=R","EMP_VAL1=H2SA002","EMP_VAL2=","EMP_VAL3=1982","EMP_VAL4=.25","EMP_VAL5=SA","EMP_VAL6=CA","EMP_VAL7=4175755","EMP_VAL8=IF0HO1","EMP_VAL17=CA","EMP_VAL20=G00S16"


In that case , how i can use the filed function.

Thanks

Posted: Tue May 05, 2009 9:34 am
by nagarjuna
Hi ,
As of now i can think of a shell script using awk to get those fields and run that script from execute activity stage .

kirankota79 wrote:The position of these name fields in the string is not same for each record they keep on varying like the following sample 3 records.

"EMP_FIRSTNAME=LATHEEF SYED","EMP_LASTNAME=HUSSAINY","EMP_FULLNAME=HUSSAINY, LATHEEF SYED"

"EMP_FIRSTNAME=CHISTINE","EMP_FULLNAME=BRENNECKE, CHISTINE"

"EMP_BASE_RATE=8.81","EMP_BIRTH_DATE=08/03/1987 00:00","EMP_FIRSTNAME=MANUEL","EMP_LASTNAME=BUENO","EMP_FULLNAME=BUENO, MANUEL","EMP_PAYGRP_NAME=PHASE1","EMP_SIN=removed","EMP_FLAG3=N","EMP_FLAG11=0","EMP_FLAG14=R","EMP_VAL1=H2SA002","EMP_VAL2=","EMP_VAL3=1982","EMP_VAL4=.25","EMP_VAL5=SA","EMP_VAL6=CA","EMP_VAL7=4175755","EMP_VAL8=IF0HO1","EMP_VAL17=CA","EMP_VAL20=G00S16"


In that case , how i can use the filed function.

Thanks

Posted: Tue May 05, 2009 9:50 am
by kirankota79
Nagarjuna, Thanks for reply.

So there is no way that we can do this from DS functions?

And one more thing other than the position is, there is a comma with in the fullname part. so using comma as delimiter in field function will remove the second part in FULL NAME.

Posted: Tue May 05, 2009 10:17 am
by sjfearnside
Since the comma in the full name is inside the quotes and would be part of that string, would it still be considered the same as a non-quoted comma, i.e. field delimiter? :?

"EMP_FULLNAME=HUSSAINY, LATHEEF SYED"

Posted: Tue May 05, 2009 10:21 am
by Sainath.Srinivasan
There are many ways to do it.

If you have located the position using Index, it must be easy to do a substring to extract the data.

Try it. If you hit any hurdles, give a shout with information of what you tried.

Posted: Tue May 05, 2009 10:46 am
by kirankota79
Hello Sai.....I already used the index for finding the start position for three names index(DSLink12.NAME,"EMP_FI",1), index(DSLink12.NAME,"EMP_LA",1), index(DSLink12.NAME,"EMP_FU",1). But the problem is with the end position. How can i get the index for the end position. Any help is great! Thanks

Posted: Tue May 05, 2009 10:47 am
by kirankota79
sjfearnside wrote:Since the comma in the full name is inside the quotes and would be part of that string, would it still be considered the same as a non-quoted comma, i.e. field delimiter? :?

"EMP_FULLNAME=HUSSAINY, LATHEEF SYED"
I already tried to use double code as field delimiter, but Datastage is not accepting it. Thanks

Posted: Tue May 05, 2009 11:23 am
by kirankota79
kirankota79 wrote:Hello Sai.....I already used the index for finding the start position for three names index(DSLink12.NAME,"EMP_FI",1), index(DSLink12.NAME,"EMP_LA",1), index(DSLink12.NAME,"EMP_FU",1). But the problem is with the end position. How can i get the index for the end position. Any help is great! Thanks
I am looking for a code that finds the index of the " (souble quote) after EMP_FI, EMP_LA and EMP_FU so that we get the end positions for 3.

Posted: Tue May 05, 2009 12:35 pm
by Editor
It has been brought to my attention of the use of real records with potentially real SS numbers. Please in the future refrain from actually posting real numbers.

THanks in advance.

Dennis James
Editor

Posted: Tue May 05, 2009 5:29 pm
by Kryt0n
kirankota79 wrote: I am looking for a code that finds the index of the " (souble quote) after EMP_FI, EMP_LA and EMP_FU so that we get the end positions for 3.
Take the substring created from your first index and do an index on this to look for the double quote

Posted: Wed May 06, 2009 7:37 am
by kirankota79
Kryt0n wrote:
kirankota79 wrote:
Take the substring created from your first index and do an index on this to look for the double quote
Can you please be more clear on this? i have first position using index(DSLink12.NAME,"EMP_FI",1), then the whole name is like

"EMP_FIRSTNAME=LATHEEF SYED" .


I didn't get, what u said like " Take the substring created from your first index" I am not getting any substring from first index. Its just a position.

Posted: Wed May 06, 2009 5:11 pm
by Kryt0n
So make a substring out of it... play around with substrings and the index you have, see what you get... I'm sure things will become easier once you think about it

Posted: Wed May 06, 2009 10:23 pm
by chulett
So... maybe it's just me but I'm not really sure what exactly you want out of this. Are you literally trying to find and extract (as one example) this complete string?

EMP_FIRSTNAME=LATHEEF SYED

and that's it, we're done? Or do you have an EMP_FIRSTNAME field you are trying to populate, so first you need to pull that string out and then you need to parse out just the LATHEEF SYED piece as the derivation value for that field? :?

Regardless, as Kryt0n is trying to impress on you, think about what the next steps should be. You found where the target string starts, how do you identify the end of it? Think about searching from that found index position to get the position of the next double-quote and then substring out the data between those two positions. Seems like that should get you pretty close.