Extracting first occurance of number from String

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dhiren
Premium Member
Premium Member
Posts: 43
Joined: Thu Aug 11, 2005 12:24 am

Extracting first occurance of number from String

Post by dhiren »

Hi All,
We have a requirement to extract the first occurance of number from the Strings.
The example of Strings are
COROMN13, LIMSTO1, ORAOMN5PUY67

For eg, in the following Strings below , i need the following Output:

COROMN13 --> Extract number 13
LIMSTO1 --> Extract number 1
ORAOMN5PUY67 ---> Extract number 5

How do i incorporate this? Is there some function in transformer where i can extract the first occurance of numbers in String?

Please guide with syntax.

Thanks in advance,
Regards,
Dhiren
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
You can go over the rouines, I don't recall seeing one that does this.

You'll need to loop on the string 1 character at a time and test for numeric and build your number if it's more the 1 character long.
you'll need Len(), Loop/For, Substring operator [start position,length]

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi dhiren,

Do you know the maximum size of the numeric value that is embedded in the string?. Can two numbers occur next to each other? If you dont know how big the number can be, it is not possible to extract it.

For Example

COROMN13198POIUT1234

If you dont know the size, what would you expect to be extracted in the above string, 1 or 13 or 13198
Thanks,
Naveen
dhiren
Premium Member
Premium Member
Posts: 43
Joined: Thu Aug 11, 2005 12:24 am

Post by dhiren »

Thanks for the replies so far.
The maximum length that can occur is unknown.

In this case(COROMN13198POIUT1234) it will be 13198.

Now if i make the Code to answer most likeliest of possibilities, then the numeric string will be last one or two characters.

eg COROMN13 --> extract 13
POIUT1 --> extract 1

Which one (function ) in this case then will be more useful?
Rightmost character extraction using (Right) or usage of Field() etc?

Regards,
Dhiren
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi dhiren,

Your requirement needs to be more specific, for us to be able to help you more.

Can we assume that the number you want to extract is always at the end of the string?
Now if i make the Code to answer most likeliest of possibilities, then the numeric string will be last one or two characters.
If the character is last 1 or 2 characters, then you can use the right function to always get the last two characters and then use the Oconv function to extract numbers only.

The syntax will be something like this

Code: Select all

Oconv(Right(InLink.InCol,2),"MCN")
Thanks,
Naveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Oconv() is not available in parallel jobs, except in BASIC Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
htrisakti3
Charter Member
Charter Member
Posts: 36
Joined: Thu Jun 10, 2004 11:22 pm

Post by htrisakti3 »

create this into a function & call it using your string as argument.. mystring is used for illustration:

Code: Select all

mystring = "ABCD4XXYYZ22Z33"
temp1 = Convert("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "                          ", mystring) 
* temp1 is now:"      4     22 33"

temp2=TRIM(temp1)
*temp2 value: "4 22 33" 

temp3=Field(temp2," ",1)
*temp2 value: "4" -- get 1st occurence

Ans = temp3
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does that work in parallel jobs?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Seeing as it's PX couldn't a C function be written to handle it? or even a lex function converted to C, compiled and used in the job?

Functionally the same as the basic code but more efficient.



<<Hey, this makes post 50 ... yippee>>> nearly at 500 points too. :)
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Ray, the code should work using transformer and Stage Variables as far as I can see (Nice Solution :!: );
You might want to add more things like lower case letters or any other variation that might also be inside the original string which you want to ignore (I wonder will the performance be satisfactory :roll: ).

I for one think the original post was quite clear on the requirements for those who asked for more info.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're going to allow BASIC Transformer stage, then the ideal function is

Code: Select all

MatchField(InLink.TheString, "0A0N0X0N0X", 2)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cmmurari
Participant
Posts: 34
Joined: Sun Jan 02, 2005 9:55 am
Location: Singapore

Post by cmmurari »

Mr.Ray , Thanks for your Nice Solution :!:
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

I've tested:

Code: Select all

$ more rtest*.txt
::::::::::::::
rtest.txt
::::::::::::::
ABG3asfgfz1234dfafahgFGSGS
ABGZ567gfz1234dfafahgFGSGS
ABGZasfgfz1234dfafah546SGS
ABGZasfgfz1234dfafahgFGSGS
::::::::::::::
rtest2.txt
::::::::::::::
"3"
"567"
"1234"
"1234"
$
rtest.txt is the source file rtest2.txt is the output file
px transformer with:
sv1 has:

Code: Select all

Convert("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz","                                                   ", DSLink2.c1)
sv2 has:

Code: Select all

Trim(sv1)
output column has:

Code: Select all

Field(sv2," ",1)
Enjoy,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
cmmurari
Participant
Posts: 34
Joined: Sun Jan 02, 2005 9:55 am
Location: Singapore

Post by cmmurari »

Yes I have tested.
Convert , Trim, and Field BASIC functions working fine in Parallel Transformer Stage.
Field(Trim(Convert("ABCDEFGHIJKLMNOPQRSTUVWXYZ", " ", "ABCDEF83743F"))," ",1).

Thanks htrisakti3
dhiren
Premium Member
Premium Member
Posts: 43
Joined: Thu Aug 11, 2005 12:24 am

Post by dhiren »

Thanks all those who replied.

This thing worked without the usage of Routine. All i needed is just to use three Stage Variables
Post Reply