You say EBCDIC, I say ASCII, let's call the whole thing off

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

You say EBCDIC, I say ASCII, let's call the whole thing off

Post by FranklinE »

Recent inquiries about how DataStage handles characters, their hexidecimal code values and maps between character sets prompts me to ask a possibly dumb question... but as I always tell my children, there's no such thing as a dumb question, but there are plenty of dumb answers out there. :P

It goes like this in my basic job design.

First Stage (usually FTP): reads data from z/OS host; Format tab uses default COBOL attributes including EBCDIC.

Last Stage (usually Sequential File): writes data to local DS server running Unix (previously Solaris, now RHE Linux); Format tab includes ASCII.

Question: on the intervening links, on which character set is DataStage performing the coded instructions? If it changes from EBCDIC to ASCII before the Last Stage, where does that happen?

Seriously, my ignorance of these details makes me feel inadequate when trying to answer questions about EBCDIC. Maybe it's just my natural paranoia...

EDIT: In case it's needed for thoughtful responses, the rest of the basic design always has a Transformer to map the source data to the layout required by the destination application. Sometimes other things like filters and joins are used, but not always. There's the occasional lookup for some jobs, but they are in the minority.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

The parallel framework processes only data sets, no matter what the external format is.

The links between stages are virtual data sets.

The internal data format is UTF-16.

Import and export operators are used to perform the conversions.

If subsequent parallel jobs need to process the same data as the first parallel job in your example, which produces a sequential file, it is more efficient to use the Data Set stage between the multiple parallel jobs, because you would avoid the export and import overhead.
Last edited by qt_ky on Fri Dec 19, 2014 10:52 am, edited 1 time in total.
Choose a job you love, and you will never have to work a day in your life. - Confucius
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Thanks, Eric. Just to be sure I understand, please confirm or correct:

In my design/example, the output links from the FTP stage, and every link between there and the input links to the Sequential File stage, process the data using UTF-16.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Yes, that is correct. I learned about it in an IBM training class one time. I edited my post above to add that last paragraph too.
Choose a job you love, and you will never have to work a day in your life. - Confucius
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Again, thank you.

Our files are destined for use by another application. I'd very much like to use datasets, but they are not an option.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

You're welcome. It is documented here:

http://www-01.ibm.com/support/knowledge ... _Sets.html

Also, now that I read a bit more, I am wondering myself if the UTF-16 statement may only apply to ustring data types (unicode extended property), whereas string data types are 8-bit ASCII.

Perhaps UTF-16 encompasses and accommodates 8-bit ASCII. Need someone more expert to clarify...

Anyhow, what I relayed above is what I was taught in training.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If NLS is enabled, parallel jobs use UTF-16 internally.

UTF-16 shares code points 0 through 127 with ASCII. Most implementations of UTF-16 share code points 0 through 255 with "extended", or 8-bit, ASCII.

The story is different for server jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply