Viewing UTF-8 data in Datastage on Unix Platform

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastage_user
Participant
Posts: 2
Joined: Wed Jun 21, 2006 7:02 am

Viewing UTF-8 data in Datastage on Unix Platform

Post by datastage_user »

Hello All,

I have a peculiar problem when viewing text data in DataSatge. When viewing a file saved in unicode format and viewed with Unicode NLS mapping in DataStage i encounter no problems. However, when i save file in utf-8 format and try to view this in Datastage with UTF-8 NLS mapping I get the following error: nls_read_delimited() - row 1, too many columns in record. How did changing to UTF-8 create this differnce?

Thanks,

Vikram
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UTF-8 is a term that covers a multitude of different encodings of Unicode. DataStage actually uses an idiosyncratic encoding that should strictly be called UV-UTF8; it preserves dynamic array delimiter characters as single byte characters, and therefore must map genuine Char(248) through Char(255) into the Unicode private use area.

Without knowing exactly what your format is, it is difficult to comment further.

But let me re-state that there are many different encodings that call themselves UTF8. Visit the Unicode Consortium website to begin your search for more information.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply