Converting special characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

You are reading data that has characters that aren't supported in the default character set. Unless you have NLS enabled for the system and the project, this isn't easily fixable.

Scratch that... This isn't easily fixable regardless!

There are several hurdles to overcome

- Determine if NLS is enabled for your system. If it isn't, you have to re-install DataStage to add it, which is NOT trivial. Adding NLS to a non-NLS system can also cause unintended issues as jobs will start reporting "I have bad characters in my data".

- Determine what character set the incoming data source is using, and see if that character set is supported on your target. If so, great. Just modify the job to use the correct NLS character set (its an option on the various stages) and the data will flow through.

- If the character set isn't supported on the target system, then you have to three options
1) Let the job replace the characters with a "?" as it is probably doing right now.
2) Scan each of the incoming text fields that contain "bad" characters and strip out the "bad" character (not recommended).
3) Scan each of the incoming text fields and substitute a supported character for each "bad" character. (Recommended)

Scanning every character in a field is quite time consuming and can really slow down a job.

Note - I also recommend looking at the "Globalization Guide" for DataStage, which covers implementing and using NLS. This is the latest version.
Last edited by asorrell on Mon Aug 19, 2013 8:21 am, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
nitingupta
Participant
Posts: 22
Joined: Fri Jul 26, 2013 9:43 am
Location: PUNE

Post by nitingupta »

HI,

I FTPED that file in unix in normal mode, and when i tried reading it in unix itself i got same transformation, so that mean its getting converted during ftp only...can anyone suggest me is there any such thing can be handled while FTPing...
NITIN GUPTA
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Post by srinivas.nettalam »

Are you ftping in binary or ascii mode?
N.Srinivas
India.
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Post by arunkumarmm »

What is the NLS map you were using for your original file? I guess this can be handled by using a proper NLS setting.
Arun
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

srinivas.nettalam wrote:Are you ftping in binary or ascii mode?
And the correct answer would be binary as you want to bring the file over intact byte for byte without any kind of 'conversion' being done. Then you can move on to the NLS characterset...
-craig

"You can never have too many knives" -- Logan Nine Fingers
nitingupta
Participant
Posts: 22
Joined: Fri Jul 26, 2013 9:43 am
Location: PUNE

Post by nitingupta »

hi,
i ftped file in binary mode also, but getting some transformations there also like below:
SE.SED640.NAPID!~~---> SE.SED640.NAPID]~~.X~~

in ascii mode also i was getting some transformation..but with binary mode very less number of transformations are happening but still there are some...
NITIN GUPTA
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How are you viewing the file? What characterset is being used for it?
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Can you check byte counts on both source and target after a binary transfer?

The definition of a binary FTP transfer is to transfer it exactly, with no transformations.
Where ASCII mode may use special control characters to format data, binary mode transmits the raw bytes of the file being transferred. In this way, the file is transferred in its exact original form.
If you view the binary FTP file and the characters it contains are unsupported under your current NLS character set, then they will be "translated" when you view it. Depending on what UNIX you have, I think the "locale" command can show what character set it uses.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply