XML Parsing Query

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

XML Parsing Query

Post by SachinCho »

Hi All,
I have been trying to parse an xml using a parallel job with following job design. We are v9.1

row gen (holds value of seq file having only single column which is xml) ---> xmlimput ----> seq file

I am trying to parse only single tag (newParty) from xml for which xsd looks like as below

<xs:element name='newParty'>
<xs:complexType>
<xs:attribute name='eventId' use='required'/>
<xs:attribute name='timeShift' use='required'/>
<xs:attribute name='userId' use='required'/>
<xs:attribute name='visibility' use='required'>
<xs:simpleType>
<xs:restriction base='xs:string'>
<xs:enumeration value='ALL'/>
<xs:enumeration value='INT'/>
<xs:enumeration value='VIP'/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:sequence>
<xs:element ref='userInfo' minOccurs='0'/>
<xs:element ref='userData' minOccurs='0'/>
</xs:sequence>
</xs:complexType>
</xs:element>

User data again has following declaration in same xsd

<xs:element name='userData'>
<xs:complexType>
<xs:sequence>
<xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
</xs:complexType>
</xs:element>


Sample xml is as follow

- <newParty userId="9032738273CDF" eventId="1" timeShift="0" visibility="ALL">
<userInfo personId="" userNick="abc" userType="CLIENT" protocolType="FLEX" timeZoneOffset="720" />
- <userData>
<item key="ChatID">Reactive</item>
<item key="ChatURL">contactuschat</item>
<item key="EmailAddress">abc@gmail.com</item>
<item key="FirstName">abc</item>
<item key="FromAddress">abc@gmail.com</item>
<item key="IdentifyCreateContact">3</item>
<item key="MediaType">chat</item>
<item key="MessageCount">Agent:0|Customer:0</item>
<item key="PhoneNumber">123456789</item>
<item key="Question">qyery1</item>
<item key="Subject">qyery1</item>
<item key="TopicID">topic1</item>
</userData>
</newParty>

We are using eventId as repetition element key as this xml field has multiple events. Trouble is parsing user data, where with standard defined schema we are able to retrieve only upto <item key="ChatID">Reactive</item>. Sub-sequent data is not getting captured and I am out of ideas at the moment.

This is how schema is defined in Datastage

/chatTranscript/@startAt
/chatTranscript/@sessionId
/chatTranscript/@savedPosition
/chatTranscript/newParty/@userId
/chatTranscript/newParty/@eventId
/chatTranscript/newParty/@timeShift
/chatTranscript/newParty/@visibility
/chatTranscript/newParty/userInfo/@personId
/chatTranscript/newParty/userInfo/@userNick
/chatTranscript/newParty/userInfo/@userType
/chatTranscript/newParty/userInfo/@protocolType
/chatTranscript/newParty/userInfo/@timeZoneOffset
/chatTranscript/newParty/userData/item/@key
/chatTranscript/newParty/userData/item/text()


So last part item@key and item/text() returns only first value from userData


Any pointers would be much appreciated

Thanks,
Sach
Sachin C
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

In this xml, "item" must be the key or "repetition element". It is the only element that repeats. You will get as many rows as you have items. You should be able to get the user data....same for each item row, provided it only occurs once for each group of items.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

Post by SachinCho »

Thanks Ernie ! Got the point. I am able to parse this one now. I was using eventid as "repetition key" as I have multiple events within custmoer chat and within events again multiple items are there. But I guess I will have to use item as key in some case and event in some case. Exploring more.
Sachin C
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Indeed...each "independent" repeating node path needs its own output link and then "repetition element". Nested is ok, but separate links when the nodes are unrelated (such as "employees" under "company" vs "assets" also under "company").

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply