Pattern - String Extraction

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Pattern - String Extraction

Post by koti9 »

Hello all,

We need to extract year and quarter from a string. Please find our source string below.

Source String may contain below values.

ABCD1998.3EFH.GHAI.678ABCD.EF
1998.3ABCD.EFH.AJG.SS.123SSS...SSS...
ABC.DEF.EF345GH.124SJLAS..1998.3

We need to extract the 1998 as the year from above 3 strings and also quarter which follows the year with a dot...

I have gone through the forum, explored with index function,field function,convert function..but none of them worked..


Thanks & Regards
Koti
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Use following Unix command in External Filter stage.

Code: Select all

sed -n  's/.*\([0-9][0-9][0-9][0-9]\.[1-4]\).*/\1/p'
Last edited by rkashyap on Thu Jul 30, 2015 9:48 pm, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... you need to find the first occurrence of four numeric digits and then the single digit that follows immediately behind that after a 'dot'? Two separate fields or one? And is that always the case or if the four don't have a dot right there after them do you keep looking deeper in the string? Give up? Want to make sure we understand the full glory of your requirements.

Why don't you show us what you tried and what didn't work about it. Show us your thought process, how you approached this.

Oh, and what is your source? Don't want to make any assumptions there. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

Thanks Kashyap, you are awesome..I was able to do this with Stage Variables though...

StageVar1 - Converted all '.' and numbers to tilda(~) using convert function.
StageVar2 - Used index function to get the position of 6 consecutive tildas in StageVar1 Variable. Used that position pull the substring from actual field.

I made sure that tildas~ were not there as part of the actual data.

Thanks Craig, your question helped me raise a question with source.

Thanks & Regards
Koti
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

You are welcome.

Just FYI for anyone using External Filter stage. In previous versions, External Filter stage has an issue in processing of special characters. See technote. If this issue still occurs, then push the arguments to a file either as described in technote or by using "-f" option of sed.
BalajiL
Premium Member
Premium Member
Posts: 4
Joined: Sat Jun 28, 2014 10:13 pm
Location: Chennai

Post by BalajiL »

Hi Koti,

Need your help here.

How did you convert all the numbers to a specific value. When i use convert function it replaces only one character. Could you please share the code which you have used for your convert function.
Warm Regards,
Balaji
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sounds like you need to re-read the documentation on the Convert function. :wink:

If you need more help that that, start your own post and let us know what problem you are trying to solve. Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Just for future reference ... Basic Transformer's MATCHFIELD function could also have been used for pattern matching and string extraction.
Post Reply