Page 1 of 1

Posted: Wed Jun 01, 2011 5:27 am
by ray.wurlod
Get version 8.5

Posted: Wed Jun 01, 2011 6:15 am
by eostic
Indeed....the new XML Stage is far faster.....how much faster will depend on a lot of things, like the number of tags, their size, etc.

As for your existing Job, consider breaking it up via EE as well as via functions in the Stage itself...I've had success (again, it depends on the document) improving Jobs in both these areas.

If your sequential file is reading only one single document, there will be less that you can do, but often in these situations it is reading hundreds or many thousands of records with "each row" containing another xml document. If that's the case, splash the rows across several partitions and allow EE to start up multiple xmlInput Stages, thus spreading the load. Even if each xmlInput Stage process runs at only 7 rows/sec, your overall Job will be more scalable, probably on an almost linear basis. Of course, you may need to run sequentially downstream and create other issues, but do some testing with EE to see if there is a benefit. The xmlInput Stage certainly supports it.

Within the Stage you can consider parsing only certain chunks, and then passing the rest downstream to another xmlStage. This is harder to do, and the benefit may be varied depending on the document. But if you have lots of sub-nodes that contain their own elements, separate them. Use a longvarchar column called "myNode" or similar and make its xpath in the Description end with a slash at the node in question. That whole "chunk" will be passed downstream and the parsing for it can be done in "that" xmlInput Stage......this is sort of "manually" parallelizing the xml processing.

No guarantees, but depending on your document, these tips may help in a pre-85 env.

Ernie