XML Composer Multiple Nested List Elements

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

XML Composer Multiple Nested List Elements

Post by gsbrown »

I'm trying to build this XML format where I have multiple repeated elements nested under <Customer> for each unique Customer Number.

Code: Select all

<Customer>
   <Customer Number='123456'>
      <Name>
         <Name Location="First">Greg</Name>
         <Name Location="Middle">S</Name>
         <Name Location="Last">Brown</Name>
      </Name>
      <ContactPreferences>
         <Contact Type="Mail">true</Contact>
         <Contact Type="Phone">false</Contact>
         <Contact Type="Email">true</Contact>
      </ContactPreferences>
      <POSAlternateKeys>
         <AltKey>456789</AltKey>
         <AltKey>111111</AltKey>
         <AltKey>222222</AltKey>
         <AltKey>333333</AltKey>
         <AltKey>444444</AltKey>
      </POSAlternateKeys>
   </Customer>
</Customers>
Image

Here is my DataStage layout for inputing the data. My main input ODBC stage has one record per Customer Number. The name and contact preference values are columns, so I'm using a Pivot stage to shift those to rows. Then I've added a sparse lookup to grab the multiple Alternate ID's that exist for each Customer Number. All of these are coming together into the XML Assembly and I'm attempting to Join/Regroup the data there and I can't find the winning combination of steps. What would be the proper flow using the HJoin and Regroup steps to get this data collected properly? I've tried HJoin-->HJoin-->Regroup and I've tried HJoin-->Regroup-->HJoin-->Regroup and I can't seem to get the above results that I need. When I initially started this I worked with only the "Output_Records" and "Output_Name" coming into the XML Assembly and it worked perfectly. Adding the 3rd "Output_AlternateIDs" layer is causing quite a challenge.

Appreciate any advice or examples, thank you!
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Greg...

Hard to say "exactly" what you'll need to do without seeing your exact xsd, but the basics are:

a) bring in "n" rows on each link for the 3 of your repeating nodes (likely to be "lists" in your xsd). Each will be keyed by the customer number.

b) regroup by the customer number

c) then regroup by each of the sub-nodes "within" that first regroup...(so that you have three resulting independent lists under the main list).

d) hjoin to combine the name and contact "lists"
e) another hjoin to combine the final altKey list.

Now you should have the related constructs to map into an xml Composer step.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

Post by gsbrown »

Ok, I've got it working and getting the exact XML output that I want, but I'm concerned about the performance. My XML stage with the setup described below takes almost 17min to process 10,000 input records joined to 30,000 records generated by the Pivot Stage

XML Composer Steps
1. Input - Link#1) Primary Input 10,000 records Link#2) Secondary input of pivoted data 30,000 records
2. Regroup1 - regrouping the 10,000 input to generate my "Alternate ID" list
3. Regroup2 - regrouping the 30,000 pivot stage input to generate my "Name" data list
4. HJoin - combine the results of both regroups on their common key
5. XML Composer - sourcing my regroup/join fields to map the XML fields

I'm getting the desired output, with this number of records I'm getting a 20MB file. I tried both "disk based" and "in memory" on the HJoin and didn't get a noticeable difference in runtime. I've got 512MB java heap size set. Performance analysis shows 99% of the processing is in the XML stage, so I don't understand what's taking 17min for that stage to process so few records runtime is to be expected? Another concern is this message in the log "main_program: The virtual memory limit is 8316600320 bytes. Raising to 18446744073709551615." So I'm really curious if I've gone about this incorrectly even though I'm getting the right output. Thanks again!
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

Post by gsbrown »

HA! Saw another member post about runtime performance issues with the XML stage and discovered the "enable logging" had an impact. I followed suit and disabled logging in my XML stage and the 17min runtime dropped to 45secs :?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Funny... almost asked if you had logging enabled in the stage. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply