junk character removal

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

junk character removal

Post by altruist »

Hi

I am trying to remove all junk characters from source data on every field. To achive this I am removing Characters 0-8, 10-31, 127-255 using convert function:

Code: Select all

Convert(Char(0):Char(1):Char(2).........etc,"",InputField)
But I see that from some of the fields the spaces are getting removed. I tried to check if there were any junk characters, but didn't find any in them using

Code: Select all

"echo "Field Value (Copied and Pasted)" | cat -v"
and

Code: Select all

"echo "Field Value (Copied and Pasted)" | od -c"
But didn't notice anything unusual characters in them. Basically my code is removign all spaces between the characters.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First off, there are no "junk" characters... but I'll save that lecture for others.

We'd probably need to see your complete derivation to be able to help, something without the "etc" in it. And I'm also curious if you are doing any other derivations on the strings post-convert or if the convert is literally all you are doing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do you get if you don't apply the Convert() function, since you assert that there are no "junk" characters present in the data?

Perhaps a better term would be non-alphanumeric.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Code: Select all

Convert(Char(0):Char(1):Char(2):Char(3):Char(4):Char(5):Char(6):Char(7):Char(8):Char(9):Char(10):Char(11):Char(12):Char(13):Char(14):Char(15):Char(16):Char(17):Char(18):Char(19):Char(20):Char(21):Char(22):Char(23):Char(24):Char(25):Char(26):Char(27):Char(28):Char(29):Char(30):Char(31):Char(127):Char(128):Char(129):Char(130):Char(131):Char(132):Char(133):Char(134)........Char(255),"",TrimLeadingTrailing(NullToEmpty(InputField)))
This is removing spaces between two values. Eg. "Test1 Test2", I am getting the output as "Test1Test2"
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

While debugging I found that the issue is occurring when I am trying to remove extended ascii characters i.e starting values 128 till 255.

Do we have to use any other function for such values ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You may need a second option on the Char() function to specify these characters correctly. It asserts whether or not the most significant bit is on.
For example:

Code: Select all

Char(164, @TRUE)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Hi Ray,

I am using Datastage 8.1, looks like there is no second option in the char() function.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you actually try it? It's not a documented argument.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Hi Ray,

I did try it Ray, but the derivation field was showing up as not valid.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That shows only that the expression editor parser doesn't like it. Does it compile and work properly?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Hi Ray,

I am unable to compile as well in 8.1
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, maybe it only works in BASIC-based components, such as the BASIC Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Hi Ray,

Is there any way to remove those Extended EBCDIC character, since char(164,@TRUE), cannot be used in 8.1
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

It may not be widely used but "Allow 8 bits flag" is present in parallel transformer as well, @TRUE may not be the correct value for the same.

Parallel job developer guide states

Char Generates an ASCII character from its numeric code value. You can optionally specify the allow8bits argument to convert 8-bit ASCII values.

try using "TRUE" or 1 for that argument as I don't seem to find any example. And I can confirm the optional argument for parallel job in 8.5 - 9.1. You can check the parallel job developer guide for 8.1 if its there as well.

Here is the link for char function in 9.1.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

I am not able to find similar function in 8.1

do you think I can construct something using OR_BITS
Post Reply