XML Input stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
andyrids
Participant
Posts: 1
Joined: Thu Apr 13, 2006 8:36 am
Location: London

XML Input stage

Post by andyrids »

I'm having a problem getting the XML Input stage to work. I have my input source as a sequential file (the actual XML doc) reading each line as a variable length string i.e with the delimiter set to "000". Each line is sent to the XML Input stage with 'Column content' set to "XML document". The output columns for the XML Input stage are loaded from a table definition I defined using the XML Meta Data Importer with my XML doc DTD.

The problem I have is with the input - I don't know how to get the XML input stage to understand the lines of input sent from the sequential file source and therefore the XML parsing fails.

All warning messages refer to line 1?? e.g. "Equity_Index..XML_Input_22: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 23): Invalid document structure"

"Equity_Index..XML_Input_22: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 98): Invalid document structure"

etc..

Can anyone help me with this?

Thanks
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: XML Input stage

Post by ogmios »

Not the expert on XMLInput but if I remember correctly you don't read an XML input file with a Sequential Stage :D . You use a FolderStage to point to the XML file and connect that to XMLInput.

"There is one crucial design requirement of XML Input - you need to pass it an input link contain a URL or an file path or an XML document".

If you installed a 7.5 client you should have the documentation on the XML stages on your PC.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
diamondabhi
Premium Member
Premium Member
Posts: 108
Joined: Sat Feb 05, 2005 6:52 pm
Location: US

Post by diamondabhi »

Andyrids,
Ogmios is right, you should use folder stage as input to the XML input stage. Aslo check the style sheet settings.

Thanks,
Abhi.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

If you have a lot of XML data to parse and are breaking it down into several streams you might want to consider going outside of DataStage to process the XML into 1 or more flat files and deal with them that way.

I changed a job stream that ran for over 4 hours processing 45 streams out of 3 largish (few hundred MB) XML's into a 20 minute stream by using an external XSLT parser. the flat files were then processed.

My understanding is that the XML addon for DS is really for trickle fed real time data, from MQseries etc, not for humungous bulk files coming through.

Consider looking beyond the sand pit for solutions, sometimes you'll be suprised :)
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hey Andrew, care to pass along the name / site for that external XSLT parser you mentioned? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

I've used two parsers;
Saxxon and Xalan/Xerces. Xalan/Xerces is on apache.org, Saxxon you'll have to search for.

I think Xalan and Xerces came orginally from IBM and were open sourced to Apache control a few years ago.

There are others available, but I like the use OS software.

You'll need to read up on XSLT to create the scripts. i mioght be able to find a copy of the XSLT I used to create CSV files, but I don't have it on hand at the moment.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks, I'll have to check them out. Interesting, but my understanding is that the XML Output stage is based on Xerces - perhaps the others as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

I think it is a xerces/saxxon implementation.

It just doesn't seem to handle the throughput as well as a more "native" on can. Sometimes we must look beyond DS for our solutions.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh, agreed - it can be slow as mud at times generating 'large' amounts of XML... that's why I was curious which external parsers you've used.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply