URGENT : XML File Size Limit

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DEB_CHOW
Participant
Posts: 2
Joined: Tue Nov 23, 2010 2:38 pm

URGENT : XML File Size Limit

Post by DEB_CHOW »

Hi,

We have an XML input file size of 500MB to be processed. Can we process it using DS 8.0 ?

What is the XML file size limit for DS 8.0 ?

Can we resolve this by upgrading to DS 8.1 ?

What is the XML file size limit for DS 8.1 ?

Appreciate your prompt response.

Thanks,
Deb
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Deb. The methodology in 8.5 changes how DataStage reads xml, and any size issues are effectively eliminated. But in 8.1 and lower, the document needs to be broken up externally first. I've seen some tooling that can do it, namely xmlMax, but I don't think its available in Windows. I've also seen sites use their own java code to break up documents outside of DS before the job gets them.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
DEB_CHOW
Participant
Posts: 2
Joined: Tue Nov 23, 2010 2:38 pm

Post by DEB_CHOW »

Thanks for the reply Ernie.

I am pretty sure DataStage 8.0 / 8.1 is able to connect up to XML sources (without an external interface) but there is some limitiation to the file size it can process. I am trying to get to the file size limits.

How is it the methodology in 8.5 different from 8.0 / 8.1 ?


Thanks,
Deb
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In 8.0/8.1 it uses xalan parser which requires the entire XML document to be loaded into memory, so that you're limited by the amount of available memory. In 8.5 it uses a really clever streaming approach (not certain which parser, probably a purpose-built one). And the new XML Transformer stage is truly a work of art!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Figure that in pre 8.5, you'll start having issues around 200M....your mileage may vary, because it depends on a whole lot of things such as the number and size of the element names and the length and values within the document......but that seems to be about average, even though I've seen bigger work successfully, and smaller fail.....

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply