XML Parser Problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

XML Parser Problem

Post by BradMiller »

I have a job where I am extracting xml file/document which contains 58 from an XML file/document,importing xml schema from xml document and writing to a sequential file.I am importing the metadata using the xml file instead of xsd.I validated the xml file using w3schools xml validator and it shows me that the xml is good.But when I run the job I get the following warning message and not loading any records to the target file.All my other xml jobs are working properly.The warning message is "XML_Input_0,0: Warning: secondxml.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 2, column: 474): An exception occurred! Type:UnexpectedEOFException, Message:The end of input was not expected" and "XML_Input_0,0: Warning: secondxml.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 1): Invalid document structure".
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It sounds like an error in the method you are using to read the document from disk. Are you using the Sequential Stage?

If this is true, change your job so that it uses the External Source Stage, and sending a "list" of xml documents to your xml Stage...

Go to my blog url below and click on the table of contents in the upper right....find the xml section and go to the link concerning "xml sources" for a more detailed explanation.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

Post by BradMiller »

Yes we are using sequential file,I'll search your blog on the net and look at the xml content you posted but I have one question why do we need to use external source instead of sequential file.I could not understand the problem.Appreciate for your response and help.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Because technically an XML file isn't a flat file with rows and columns, it's a stream of data that could very well just be one long "record". Sometimes it can be read like one with success but more often reading it in that fashion just plain ol' horks it up. Best Practice is to avoid the Sequential File stage altogether and use an External Source stage to feed just the filenames to the XML Input stage and let it (the XML Input stage) do the actual reading of the files.

In a Server job, the Folder stage would take the place of the External Source stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...to add more detail to Craig's response, the problem is usually the fact that any random space or stray CRLF is just "noise" to an xml parser (such characters are formally ignored by the xml standard)....but a stray set of blanks, or a CRLF, or other odd character can ENTIRELY change the behavior of the Sequential Stage.

A Job could work fine for 1000's of documents and then blow up one day because of a CRLF in the middle.....it's no fault of the Sequential Stage, it is designed to look for such things. I know there are settings in the Stage that you can tweak and such, but why bother? Just have the xml stage do the actual i/o and parsing --- it's designed for that.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

Post by BradMiller »

Thank you its very clear.
Post Reply