XML Input stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

XML Input stage

Post by suryadev »

Hello,

I am trying to parse an XML document using
External source--->XML_input---->Sequentialfile

Gave all the properties in the stages and I see no warnings or errors but the output looks different.
Input is <ENTITY>172</ENTITY>
<ACCOUNT>0000000006196</ACCOUNT>
<TRACK>0</TRACK>
<USER_CD>1101</USER_CD>
Output I get is ENTITY:<ENTITY>172</ENTITY>ACCOUNT:<ACCOUNT>0000000006196</ACCOUNT>TRACK:<TRACK>0</TRACK>USER_CD:<USER_CD>1101</USER_CD>

I was expecting ENTITY:172
ACCOUNT:0000000006196
TRACK:0
USER_CD:1101

can someone help me why am I getting the tags also in the output?

Thank you!
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Describe the XPath expressions and other settings you have used in the XML Input stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

XML input stage:
Under stage->transformation Settings->selected repetition element required
Input->XMLsource->gave filename and selected URL/Filepath
-->columns--->gave filename with varchar (no) an length 255
output->transformation Settings->selected repetition element required
---> columns -gave all the columns except the top one

Now after trying some options the output looks like
ENTITY
<ENTITY>172</ENTITY>
ACCOUNT
<ACCOUNT>0000000006196</ACCOUNT>
TRACK
<TRACK>0</TRACK>
USER_CD
<USER_CD>1101</USER_CD>
which is the column name is good with entity or account or track or user_code and the records are also under the corresponding column but tags are not removed. Not sure what the reason could be :?
Thanks,
Surya
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

There is some issue with the main tag which is the ID. Now after using it as ID I got the right output when pulling the XML from file which has its data loaded from a table.

When reading the same CLOB field from the DB directly, it is throwing an error which says "Xalan fatal error" due to Invalid document structure"
The job design is DBstage--->Transformer---->XMLinput----->Sequentialfile

I am not using the external source stage.....as there is no path for the table...

Please suggest,Thank you
Thanks,
Surya
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The result you are getting is by design. Sometimes it is useful to retrieve whole tags.

Open your output link and scroll all the way to the right and review the Description property.

The contents are formal "xpath". When it ends in a slash, you get the whole tag. When it ends in /text(), you get the "contents" of the tag.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thank you Ernie!

So if the whole tags are retrieved , I can get the output when pulled from the DB. How do I get the whole tags, when importing do I need to select something?
as you said when I see the description it ends in /text(), please suggest how to retrieve the whole tags?
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's where I was heading when I asked you for your xpath expressions. Ernie pre-empted me. But that's OK - the more people answering questions well, the better.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

suryadev wrote:please suggest how to retrieve the whole tags?
eostic wrote:When it ends in a slash, you get the whole tag.
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

I removed the text() after / which made the derivation ending with / but still I get the warning and the records are not passing after XML input stage.

Below are the warnings
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 2, column: 2): Invalid character (Unicode: 0x0)
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 1, column: 6): There are more end tags than start tags
XML input document parsing failed.
Xalan fatal error (publicId: , systemId: , line: 2, column: 1): Invalid document structure


Thanks again
Thanks,
Surya
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Which problem are we helping to resolve? Take my post and apply it to your working job in the very first post. That will show you how to get whole tags (what your very first post is getting) vs just the "value" of the tags (what your first post says you wanted).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Actually when waiting for the solution for the issue, I tried some options and was updating the output each time I made changes.

Now I get the output from the job in first post as wanted but only when I use External source.My requirement is to get the same output when I read the CLOB field information from the table instead of reading from a sequential file.

Please let me know if I need to open a new post for this, I felt its the same parsing XML from file Vs parsing XML from table so continued....
Thanks,
Surya
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

First write a job that sends your clob to a sequential file without the xml stage. Use "none" for delimeter and quote character. Look at the resulting xml. Make sure it is clean and can be opened in your browser as well formed xml.

Once you have that working, send into your xmlInput but use the "content" option instead of the url option.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thank you very much, will implement the same with two jobs.

So my assumption is XML cannot be parsed directly from the CLOB field of Table?
Thanks,
Surya
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ernie said no such thing. Out of curiosity, has your premium membership expired?
-craig

"You can never have too many knives" -- Logan Nine Fingers
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

The way which Ernie suggested worked for me, loaded the XML CLOB into a file and then parsed it with XML input stage but I have around 15 more transformations to do which should include CLOB column and other columns of the table.

So it needs to be a single job, that is the reason I was asking for parsing technique from the table directly.

Sorry for the confusion, hope you understand!
Thanks,
Surya
Post Reply