Page 1 of 1

Having issues reading XML files with line breaks

Posted: Tue Jul 26, 2016 9:58 pm
by cosec
Hi All,

I am trying to read the contents of an XML file in to a Text File but encounter error when there is a line break in the XML file.

Job Design
XML Source File -> XML Input Stage -> Transformer -> Text File

The job works fine when there is no line break within the XML fields.

However, when there is a line break in one of the fields I encounter error as follows(I have indicated that the column can have terminators):
XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 226): Invalid character (Unicode: 0x0)

XML Source File structure example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><TBL_A><FIELD1>513</FIELD1><FIELD2>AAAA
01/BBBB</FIELD2></TBL_A>

Any suggestions on how I could avoid the error but without removing the line breaks ?

I would greatly appreciate your advice.

Posted: Wed Jul 27, 2016 6:43 am
by chulett
Can you confirm for us what the first stage in your job is - sequential file stage? If so, suggest you replace it with a Folder stage (ideally just passing in the filename and letting the XML Input stage do the reading) and see if the problem persists.

Posted: Wed Jul 27, 2016 7:08 am
by eostic
You can process the entire content, or just the name of the file.....the Folder Stage has a "built-in" Table Definition..... notice how it contains the filename and the "record"...this is the actual content of the whole file......

...then, in the xmlInput Stage, you check whether your column contains content or just a URL.

Either way, the CRLFs are ignored, by design, as xml requires.

Ernie

Posted: Wed Jul 27, 2016 10:20 am
by chulett
True dat. No clue what the limit is any more but back in the day when we were processing "large" XML files, it seemed better to just pass the URL to the XML Input stage and let it do all of the work. From what I recall. :wink:

Posted: Wed Jul 27, 2016 10:57 am
by eostic
Yes...true...and largely a factor of the downstream Stage. If you are using a Server Job, the Folder Stage can probably "lift" a bigger XML document than the xmlInput Stage can handle. The xmlInput Stage is usually good up till 200 megabytes or so..... The Hierarchical Stage can read dramatically larger documents, but comes with a price....it requires an xsd and has a more complex learning curve. For small documents that are largely transactional, and when you are just reading them, it's almost always better to just use xmlInput.

Ernie

Posted: Wed Jul 27, 2016 12:05 pm
by chulett
That same number is in the back of my mind... 200 to 250MB was our limit on our HPUX system, I do believe. Thankfully, our sources generally were of the mind to flood us with a metric crap-ton of small files. 8)