Here is my DataStage layout for inputing the data. My main input ODBC stage has one record per Customer Number. The name and contact preference values are columns, so I'm using a Pivot stage to shift those to rows. Then I've added a sparse lookup to grab the multiple Alternate ID's that exist for each Customer Number. All of these are coming together into the XML Assembly and I'm attempting to Join/Regroup the data there and I can't find the winning combination of steps. What would be the proper flow using the HJoin and Regroup steps to get this data collected properly? I've tried HJoin-->HJoin-->Regroup and I've tried HJoin-->Regroup-->HJoin-->Regroup and I can't seem to get the above results that I need. When I initially started this I worked with only the "Output_Records" and "Output_Name" coming into the XML Assembly and it worked perfectly. Adding the 3rd "Output_AlternateIDs" layer is causing quite a challenge.
Ok, I've got it working and getting the exact XML output that I want, but I'm concerned about the performance. My XML stage with the setup described below takes almost 17min to process 10,000 input records joined to 30,000 records generated by the Pivot Stage
XML Composer Steps
1. Input - Link#1) Primary Input 10,000 records Link#2) Secondary input of pivoted data 30,000 records
2. Regroup1 - regrouping the 10,000 input to generate my "Alternate ID" list
3. Regroup2 - regrouping the 30,000 pivot stage input to generate my "Name" data list
4. HJoin - combine the results of both regroups on their common key
5. XML Composer - sourcing my regroup/join fields to map the XML fields
I'm getting the desired output, with this number of records I'm getting a 20MB file. I tried both "disk based" and "in memory" on the HJoin and didn't get a noticeable difference in runtime. I've got 512MB java heap size set. Performance analysis shows 99% of the processing is in the XML stage, so I don't understand what's taking 17min for that stage to process so few records runtime is to be expected? Another concern is this message in the log "main_program: The virtual memory limit is 8316600320 bytes. Raising to 18446744073709551615." So I'm really curious if I've gone about this incorrectly even though I'm getting the right output. Thanks again!
HA! Saw another member post about runtime performance issues with the XML stage and discovered the "enable logging" had an impact. I followed suit and disabled logging in my XML stage and the 17min runtime dropped to 45secs