XML Composer Multiple Nested List Elements

gsbrown · Post by **gsbrown** » Wed Apr 02, 2014 10:33 am

I'm trying to build this XML format where I have multiple repeated elements nested under <Customer> for each unique Customer Number.

Code: Select all

<Customer>
   <Customer Number='123456'>
      <Name>
         <Name Location="First">Greg</Name>
         <Name Location="Middle">S</Name>
         <Name Location="Last">Brown</Name>
      </Name>
      <ContactPreferences>
         <Contact Type="Mail">true</Contact>
         <Contact Type="Phone">false</Contact>
         <Contact Type="Email">true</Contact>
      </ContactPreferences>
      <POSAlternateKeys>
         <AltKey>456789</AltKey>
         <AltKey>111111</AltKey>
         <AltKey>222222</AltKey>
         <AltKey>333333</AltKey>
         <AltKey>444444</AltKey>
      </POSAlternateKeys>
   </Customer>
</Customers>

Here is my DataStage layout for inputing the data. My main input ODBC stage has one record per Customer Number. The name and contact preference values are columns, so I'm using a Pivot stage to shift those to rows. Then I've added a sparse lookup to grab the multiple Alternate ID's that exist for each Customer Number. All of these are coming together into the XML Assembly and I'm attempting to Join/Regroup the data there and I can't find the winning combination of steps. What would be the proper flow using the HJoin and Regroup steps to get this data collected properly? I've tried HJoin-->HJoin-->Regroup and I've tried HJoin-->Regroup-->HJoin-->Regroup and I can't seem to get the above results that I need. When I initially started this I worked with only the "Output_Records" and "Output_Name" coming into the XML Assembly and it worked perfectly. Adding the 3rd "Output_AlternateIDs" layer is causing quite a challenge.

Appreciate any advice or examples, thank you!

eostic · Post by **eostic** » Wed Apr 02, 2014 2:47 pm

Hi Greg...

Hard to say "exactly" what you'll need to do without seeing your exact xsd, but the basics are:

a) bring in "n" rows on each link for the 3 of your repeating nodes (likely to be "lists" in your xsd). Each will be keyed by the customer number.

b) regroup by the customer number

c) then regroup by each of the sub-nodes "within" that first regroup...(so that you have three resulting independent lists under the main list).

d) hjoin to combine the name and contact "lists"
e) another hjoin to combine the final altKey list.

Now you should have the related constructs to map into an xml Composer step.

Ernie

gsbrown · Post by **gsbrown** » Thu Apr 10, 2014 2:41 pm

Ok, I've got it working and getting the exact XML output that I want, but I'm concerned about the performance. My XML stage with the setup described below takes almost 17min to process 10,000 input records joined to 30,000 records generated by the Pivot Stage

XML Composer Steps
1. Input - Link#1) Primary Input 10,000 records Link#2) Secondary input of pivoted data 30,000 records
2. Regroup1 - regrouping the 10,000 input to generate my "Alternate ID" list
3. Regroup2 - regrouping the 30,000 pivot stage input to generate my "Name" data list
4. HJoin - combine the results of both regroups on their common key
5. XML Composer - sourcing my regroup/join fields to map the XML fields

I'm getting the desired output, with this number of records I'm getting a 20MB file. I tried both "disk based" and "in memory" on the HJoin and didn't get a noticeable difference in runtime. I've got 512MB java heap size set. Performance analysis shows 99% of the processing is in the XML stage, so I don't understand what's taking 17min for that stage to process so few records runtime is to be expected? Another concern is this message in the log "main_program: The virtual memory limit is 8316600320 bytes. Raising to 18446744073709551615." So I'm really curious if I've gone about this incorrectly even though I'm getting the right output. Thanks again!

gsbrown · Post by **gsbrown** » Thu Apr 10, 2014 3:18 pm

HA! Saw another member post about runtime performance issues with the XML stage and discovered the "enable logging" had an impact. I followed suit and disabled logging in my XML stage and the 17min runtime dropped to 45secs

chulett · Post by **chulett** » Thu Apr 10, 2014 4:23 pm

Funny... almost asked if you had logging enabled in the stage.