Copy XML content from file to oracle without parsing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

Copy XML content from file to oracle without parsing

Post by PeteM2 »

Hi,
One of our requirement is to read an xml file located in a given path on unix and store the entire content of the xml file onto one column and a single record in an Oracle table.
This table would comprise of 2 columns - filename(varchar) and xml content(xml type or clob).
We do not want to parse the xml content at this phase.
Which stage would be suitable for this purpose?
Currently using Datatstage V9.1
thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Simplest method is to write a Server Job that uses the Folder Stage....and then sends out the file name and the file "content" as two columns on the output link. Use the built-in Folder table def for that output link. Write to Oracle as needed.

Another method is to use an EE Job ...read the filenames from disk using the External Source Stage (with a unix "list" command) and then feed those into the Hierarchical Stage where you can validate and then "chunk" the incoming xml according its xsd and send it back out again on an output link as a full xml "string". Write to Oracle as needed.

The former is vastly simpler, but has limits of 200-300 meg per whole document.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

Post by PeteM2 »

Thanks Ernie.
The server job with folder stage worked just fine.
Would just like to add one point here - When the default column definition is loaded for the folder stage from the built in definitions, the length of the "Record" column would be set to 999999. The output on the CLOB column in the target oracle table had a mixture of some unreadable characters followed by readable characters when we ran the job with the default settings.
We can just remove the length for that column to avoid such an output.
Thanks again.
thanks
Post Reply