Problem reading hexadecimal characters in CFF

sg33 · Post by **sg33** » Mon Aug 29, 2016 7:40 am

Hi -

There is a mainframe file that we copied over to our unix server (using binary mode). When trying to read the file in datastage, some columns show as special characters.
When we asked our source about it, they said these columns are defined as hex on their side.

For eg: they see the value of the field as X'30' whereas we see the value as some weird special characters which when copied to ultraedit seems like [SUB] or [STX] or [BS].

Not sure if there is some property that we can define when reading the file to fix this or there is some issue when copying the file over to unix.

Other information:
1) We are reading the file via Complex flat file stage.
2) The properties set are:
-Big-Endian as the byte order
-EBCDIC as character set
-Binary as the data format
-And "Allow all zeroes"
3) All other columns, except for the few that the source considers as hex are being read fine.
4) This field is being read as PIC X(1) as per the copy code sent to us by the source.
5) We tried changing the field value as PIC S9(1) COMP-3 but it gave an import error.

Please advise how we can fix this and thanks in advance!!!

FranklinE · Post by **FranklinE** » Mon Aug 29, 2016 8:56 am

See my reply to nimcurry on a similar request: viewtopic.php?t=156559

Please verify the copy. Be sure that the EBCDIC encoding is being preserved. As I suggested to nimcurry, try using PIC 9(1) COMP (unsigned binary integer) instead of COMP-3.

Good luck.

sg33 · Post by **sg33** » Mon Aug 29, 2016 9:23 am

Thanks for your response Franklin.

I tried to read this attribute as Binary, but it threw an error in the very next attribute on the copy code. This attribute was being read ok with decimal values with the original copy code when the problematic column was read as PIC X(1).

This next column is defined as Decimal (11,2)

FranklinE · Post by **FranklinE** » Mon Aug 29, 2016 9:38 am

You're welcome.

Looks like you need to analyze the file with the program and its developer who creates it. A full analysis should include comparing the source file with the copied destination file, to make sure the copy process doesn't insert additional control characters (delimiter, end of record, etc.).

It's darn odd to see a field border violation like that. That's why I suggest looking for control characters. For example, are there any VarChar fields involved in the record? The field-length prefix is sometimes a problem.

Anyway, reviewing the entire record layout is good practice. Sometimes trying to fix one problem point can uncover other problem points.

Shorter answer: welcome to the chaotic world of EBCDIC to ASCII. Bring survival gear.

sg33 · Post by **sg33** » Mon Aug 29, 2016 1:23 pm

Tried to read the file using the sequential file stage and defined similar properties as the CFF with EBICDIC as character set and binary.

The special characters occur like {0c}, {d1} etc. when i view the data via Designer.

But when running the job, the job aborts giving a short read error, the job ran fine with the same metadata and CFF file, i am trying to change some properties of seq file to see if it can read the whole thing.

FranklinE · Post by **FranklinE** » Mon Aug 29, 2016 1:38 pm

Try viewing the file with something other than DataStage. I use SlickEdit. It allows me to enhance the view for all special characters present.

Ideally, view the data on the source platform. In z/OS we use ISPF editor (based on Xedit, derived from vi), and it's easy to toggle to and from a hexadecimal display of every byte.

Other than a true binary field, x0C will appear only in the right-most byte of packed decimal fields (COMP-3) for an explicitly positive value. The other permitted values there are x0D for negative and x0F for unsigned. The packed decimal final half-byte is always reserved for the sign.

For display signed numeric with trailing sign, the final byte will have C, D or F in the first half-byte. xD1 in the right-most for a display numeric would be a negative value with the final digit being a 1.

EDIT: Neglected to mention that the EBCDIC hex values for the numerics are F0 through F9. A signed numeric would replace the F with a C or D.

sg33 · Post by **sg33** » Tue Sep 06, 2016 10:47 am

So we were finally able to fix this, i am outlining the steps we did:
1) Read the columns with hex values with Extended property "Unicode".
2) Used the Seq and SeqAt functions to convert the values from EBCDIC to ASCII.

The file was downloaded as binary from the mainframe server.

Cheers!