WE8ISO8859P1 vs AL32UTF8 vs UTF-8

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

WE8ISO8859P1 vs AL32UTF8 vs UTF-8

Post by karthi_gana »

All,

We are using 18 databases for our project. Datastage job will load the data into ORACLE. The NLS setting in datastage side is UTF-8.

Background:

In the datastage job, we are using ODBC stage to load the data into ORACLE. we used NVARCHAR in 8 version. We recently migrated datastage from 8 to 8.5. The major change which we did for this migration was " changed NVARCHAR to VARCHAR in all th ODBC stage". This exerciese is done for more than 2000 jobs.

Now, we faced some issue in PROD related to this datatype in one database. so we changed VARCHAR to NVARCHAR only for that particulat database and particular job.

I am new to this project. I am in the position to find the root cause. what is happening actually in between ORACLE and DATASTAGE.

The first step of analysis is to understand the CHARSET & NCHARSET defied in ORACLE side. I did that and noticed the below difference in that particular database.

SELECT * FROM sys.props$ WHERE name = 'NLS_CHARACTERSET' ;
NLS_CHARACTERSET AL32UTF8

SELECT *
FROM NLS_SESSION_PARAMETERS;


NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

SELECT * FROM NLS_DATABASE_PARAMETERS;
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_LANGUAGE ENGLISH
NLS_TERRITORY UNITED KINGDOM
NLS_CURRENCY #
NLS_ISO_CURRENCY UNITED KINGDOM
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET AL32UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE ENGLISH
NLS_SORT BINARY
NLS_TIME_FORMAT HH24.MI.SSXFF
NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF
NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH24.MI.SSXFF TZR
NLS_DUAL_CURRENCY ?
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_RDBMS_VERSION 10.2.0.4.0

Other Databases:

SELECT * FROM sys.props$ WHERE name = 'NLS_CHARACTERSET' ;

NLS_CHARACTERSET WE8ISO8859P1

SELECT *
FROM NLS_SESSION_PARAMETERS;


NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

SELECT * FROM NLS_DATABASE_PARAMETERS;

NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 10.2.0.4.0

I am sure this is causing because of some wrong CHARACTERSET defined either in DB or ETL side. I need some experts help to go further and start my second step analysis.
Karthik
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: WE8ISO8859P1 vs AL32UTF8 vs UTF-8

Post by kwwilliams »

1. What problems are you having with the data?

2. Why was the change from NVARCHAR to VARCHAR made across 2,000 jobs? You should be importing all of the table metadata to ensure that you get the data types correct according to Oracle's database dictionary.

3. Apart from the NVARCHAR to VARCHAR change, what other changes were made to the jobs, such as connector migration?

4. Which ODBC stage are you using?

5. Have you contacted your support provider for their assistance?

My advice, start small and test one change at a time. Find one job to work to use as a test subject. On the ODBC target stage change the NLS charset to the corresponding charset of the target database. Test the job, if that doesn't fix the problem, correct the metadata by importing into Information Server and then place the correct metadata on the link.

If that doesn't work, isolate the problem data at the source and identify it to understand whzt kinds of characters are causing the issues.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Re: WE8ISO8859P1 vs AL32UTF8 vs UTF-8

Post by karthi_gana »

I am usign ODBC Connector.

I have seen some info on the below URL.

viewtopic.php?p=359928
That it already set to Project default ( UTF-8 ) for both 'NCHAR/NVARCHAR2' and 'Other types'
How to know UTF-8 are mapped to NCHAR/NVARCHAR ?
Karthik
Post Reply