XML stage - challenge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

XML stage - challenge

Post by MT »

Hi,

I am using the XML stage in DS 8.5 and I can not find a solution for this problem - hopefully one of you can help:

My input is a single stream with data - a little simplyfied it is:
Workgroup lastname firstname city citycode telephone

A single person can have muliple adresses (cities and city codes) as well a s multiple telephone numbers. The number of adresses differ from the number of telefone numbers.

Because I have only a single input stream I use REGROUP steps to get to following structure:

Code: Select all

<Workgroup>
  <Employee>
     <Lastname>Smith</Lastname>
     <Firstname>Tom</Firstname>
     <Adress>
        <City>Munich</City>
        <CityCode>81707</CityCode> 
     </Adress>
     <Adress>
        <City>Berlin</City>
        <CityCode>12345</CityCode> 
     </Adress>
     <Communication>
         <Telephone>030/123456</Telephone>
     </Communication>
     <Communication>
         <Telephone>0172/9822776</Telephone>
     </Communication>
     <Communication>
         <Telephone>+49 175/9998833</Telephone>
     </Communication>
  </Employee>
...
</Workgroup>

My problem is to get the Communication and Adress - which are on the same level - both beyond employee - done rigth within the REGROUP steps.
Because the communication and adress are independed from the number of appearances I do not know how to configure the REGROUP step.
For example if I do the Adress regroup - City and CityCode are clearly "Childs" but the Communication is not a "Parent" nor a "child" and can not be deleted from the REGROUP.

Any hints for that?
I did not find and information in the redbook or anywhere else and the problem of two substructures on the same level is not a strange thing....

many thanks in advance
regards

Michael
pnpmarques
Participant
Posts: 35
Joined: Wed Jun 15, 2005 9:27 am

Post by pnpmarques »

Hello,
I cannot make any comment to the "regroup step" since I don't know version 8.5. But that issue on having 2 repeatable elements on the same level is familiar.
Our solution was to treat each element separately and putting each group of repetitions in a single field.
This is, using your example, creating a block of xml code with only the Address element repetitions, and another block of xml code with Communication element repetitions.
Then, having those pieces of code in two large columns, join them to Employee remaining elements (First and Last name as Key perhaps).
Your job design will have a Join stage between several Xml output stages.

I hope you got the idea. Here you can find Xml Best Practices which helped me a lot:
http://www.duke-consulting.com/DataStage_Tips.htm

Best Regards,
Pedro.
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

Hi Pedro,

thanks for your comment but I am not talking about the 2old" stage like XML input etc but about the new XML stage.
It is completely different and I want to use this as it is future proof.
regards

Michael
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Both solutions are correct......and the technique is "conceptually" the same.

Using xmlOutput Stage, this has to be done in three or more separate Stages....first create the address repeating node.....then on another set of links create the phone numbers repeating node...then store one of those nodes in a file for lookup purposes........and then downstream, perform a lookup (based on a common key -- probably the employeeID or other identifier) to get everything on a common record and then construct the final document in a third xml stage. This is necessary because the xmlOutput Stage is entirely relationally based and can only handle one repeating group at a time.

This is where the "new" XML Stage excels. It is built for dealing with hierarchies, and it can bring in multiple input links....of people....of their (multiple) addresses ....of their (multiple) phone numbers, etc. Then, inside that Stage, you still have to group things according to their hierarchy. You don't need to actually "craft" the xml as in the older Stage technology, but you do need to organize it by the repeating nodes.

Use a re-group step for the addresses.
Use another re-group step for the phone numbers

Now you can join these results with an HJoin step.

Watch the "results" of the regroups VERY carefully......and when doing the regroups, make careful note of the scope (nested re-groups, if you are going deep, need to have a scope of their parent).

I suggest that you get in the habit of always using the "more...." option in every pull down of every map dialogue or parent/child selection dialog within the Assembly editor..... the auto mapping or "smart" suggestions are nice, and powerful once you are super familiar with a particular schema and assembly activity, but it's too prone to making mistakes. Use "more..." and view the hierarchy that you are building, EACH AND EVERY STEP ALONG THE WAY.

It is easy to get yourself lost...but after awhile you will start to see a pattern emerge for how you need to join the nodes together, or to join them to a parent node that you have already defined.....

Eventually your series of regroups and hjoins will lead you to the xmlComposer step where you can map the results to the various target lists and details.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
gowrishankar_h
Participant
Posts: 42
Joined: Wed Dec 26, 2012 1:13 pm

Re: XML stage - challenge

Post by gowrishankar_h »

XMLoutput stage and inputstage are embedded in version 8.5.so you can apply the same logic in XMLstage of version 8.5 using composer.Verify the XML stage redrook.

Regards,
Gowri
Post Reply