Breaking Up Mixed Data and Classifying

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Breaking Up Mixed Data and Classifying

Post by emeri1md »

I'm taking mixed data, such as 3KG, and breaking them up. It's easy as long as I know for certain that the trailing characters will always be the measurement unit. However, I also get other data in the same format. This is giving me bad data in my output.

I need to take a pattern like this:

* >

and make sure that the characters belong to the measurement unit classification. I know I can use CONVERT in many ways, but none that I can think of seem like they will work. If I could get the number removed from the >, then I could easily use CONVERT, but nothing (according to the reference) seems to remove a portion of the token, just copy.

Is there a way to check a user variable for the classification of the word? I'm fairly certain there isn't.

Thanks.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Check below pattern actions to lookup the trailing alphabetic against a lookup table using convert

copy [1](-c) Temp1 ; Will extract all trailing alphabetic characters from operand 1
copy[1](n) Temp2, Will extract all leading numeric characters form operand 1

CONVERT Temp1 @CHECKMESUNIT.TBL TEMP ; if Temp1 on CHECKMESUNIT table it will return "MesUnit" string

[Temp1 = "MesUnit"]
COPY Temp1 {MesUnit}
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Post by emeri1md »

That would work, except I also need to use the abbreviated form of the word. Saying that, and thinking about your solution, a messy fix would be to have two tables, one to see if the value of the characters is a unit of measure, and a second one to actually change the value to the abbreviated version.

Do you see any alternatives?

Thanks,

Matt
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Hi Matt,

Take a look at CONVERT_S (p31 in the PAL reference).
Sounds like it might be what you want.

< ; whatever pattern
CONVERT_S [1] @units.tbl TKN U ^ ;where U is some type for units

^ | U ;the CONVERT_S permanently split it into 2 tokens.
COPY [1] {Amount}
COPY [2] {Unit}
RETYPE [1] 0
RETYPE [2] 0

The TBL has the normal and stan version of the unit and an optional Comparison Threshold, like a CLS file without the Field Type column.
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Post by emeri1md »

Thanks for bringing that up, Stuart, but the patterns < or > do not always match up to measurement. It can be another value as well. I need to be able to check and see if it is a unit before committing to it. The same for other types of < and >.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's why Stuart recommended a lookup against units.tbl as part of his solution.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Post by emeri1md »

So you're saying that it will only change the type if it finds it in the table... Sorry, I must have missed that. Thanks for the help. I'll try it out and then mark this as resolved.
Post Reply