Recent inquiries about how DataStage handles characters, their hexidecimal code values and maps between character sets prompts me to ask a possibly dumb question... but as I always tell my children, there's no such thing as a dumb question, but there are plenty of dumb answers out there.
It goes like this in my basic job design.
First Stage (usually FTP): reads data from z/OS host; Format tab uses default COBOL attributes including EBCDIC.
Last Stage (usually Sequential File): writes data to local DS server running Unix (previously Solaris, now RHE Linux); Format tab includes ASCII.
Question: on the intervening links, on which character set is DataStage performing the coded instructions? If it changes from EBCDIC to ASCII before the Last Stage, where does that happen?
Seriously, my ignorance of these details makes me feel inadequate when trying to answer questions about EBCDIC. Maybe it's just my natural paranoia...
EDIT: In case it's needed for thoughtful responses, the rest of the basic design always has a Transformer to map the source data to the layout required by the destination application. Sometimes other things like filters and joins are used, but not always. There's the occasional lookup for some jobs, but they are in the minority.
You say EBCDIC, I say ASCII, let's call the whole thing off
Moderators: chulett, rschirm, roy
You say EBCDIC, I say ASCII, let's call the whole thing off
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
The parallel framework processes only data sets, no matter what the external format is.
The links between stages are virtual data sets.
The internal data format is UTF-16.
Import and export operators are used to perform the conversions.
If subsequent parallel jobs need to process the same data as the first parallel job in your example, which produces a sequential file, it is more efficient to use the Data Set stage between the multiple parallel jobs, because you would avoid the export and import overhead.
The links between stages are virtual data sets.
The internal data format is UTF-16.
Import and export operators are used to perform the conversions.
If subsequent parallel jobs need to process the same data as the first parallel job in your example, which produces a sequential file, it is more efficient to use the Data Set stage between the multiple parallel jobs, because you would avoid the export and import overhead.
Last edited by qt_ky on Fri Dec 19, 2014 10:52 am, edited 1 time in total.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Thanks, Eric. Just to be sure I understand, please confirm or correct:
In my design/example, the output links from the FTP stage, and every link between there and the input links to the Sequential File stage, process the data using UTF-16.
In my design/example, the output links from the FTP stage, and every link between there and the input links to the Sequential File stage, process the data using UTF-16.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Again, thank you.
Our files are destined for use by another application. I'd very much like to use datasets, but they are not an option.
Our files are destined for use by another application. I'd very much like to use datasets, but they are not an option.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
You're welcome. It is documented here:
http://www-01.ibm.com/support/knowledge ... _Sets.html
Also, now that I read a bit more, I am wondering myself if the UTF-16 statement may only apply to ustring data types (unicode extended property), whereas string data types are 8-bit ASCII.
Perhaps UTF-16 encompasses and accommodates 8-bit ASCII. Need someone more expert to clarify...
Anyhow, what I relayed above is what I was taught in training.
http://www-01.ibm.com/support/knowledge ... _Sets.html
Also, now that I read a bit more, I am wondering myself if the UTF-16 statement may only apply to ustring data types (unicode extended property), whereas string data types are 8-bit ASCII.
Perhaps UTF-16 encompasses and accommodates 8-bit ASCII. Need someone more expert to clarify...
Anyhow, what I relayed above is what I was taught in training.
Choose a job you love, and you will never have to work a day in your life. - Confucius
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If NLS is enabled, parallel jobs use UTF-16 internally.
UTF-16 shares code points 0 through 127 with ASCII. Most implementations of UTF-16 share code points 0 through 255 with "extended", or 8-bit, ASCII.
The story is different for server jobs.
UTF-16 shares code points 0 through 127 with ASCII. Most implementations of UTF-16 share code points 0 through 255 with "extended", or 8-bit, ASCII.
The story is different for server jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.