Splitting One Record to Many

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Splitting One Record to Many

Post by vijayrc »

Hi,
I have a file with records that carry few occurences[10] of Charges.
[1] For each of these input records with charges as an array [1-10], how do I create 10 ouput records with Charge1 in record1, charge2 in record2 and so on. What stage[s] is the best ?
[2] Again not all input records would carry all 10 charges, it might be 7, 6 or 9 charges, and this no of charges is known by another column. So if an input record has 4 charges, how do I limit to just writing 4 output records with Charge1, charge2, charge3, charge4 and not write records for zero charges.
Input:
AAAAAAAAA 5 $10 $9 $8 $7 $6 blank blank blank blank blank
Output
AAAAAAAAA $10
AAAAAAAAA $9
AAAAAAAAA $8
AAAAAAAAA $7
AAAAAAAAA $6
Just learning, and the tips learnt here, has tremendously helped me this one week. Thanks guyz. -Vijay
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Re: Splitting One Record to Many

Post by vijayrc »

Any help on this is appreciated...Please..!!
Thanks, V
vijayrc wrote:Hi,
I have a file with records that carry few occurences[10] of Charges.
[1] For each of these input records with charges as an array [1-10], how do I create 10 ouput records with Charge1 in record1, charge2 in record2 and so on. What stage[s] is the best ?
[2] Again not all input records would carry all 10 charges, it might be 7, 6 or 9 charges, and this no of charges is known by another column. So if an input record has 4 charges, how do I limit to just writing 4 output records with Charge1, charge2, charge3, charge4 and not write records for zero charges.
Input:
AAAAAAAAA 5 $10 $9 $8 $7 $6 blank blank blank blank blank
Output
AAAAAAAAA $10
AAAAAAAAA $9
AAAAAAAAA $8
AAAAAAAAA $7
AAAAAAAAA $6
Just learning, and the tips learnt here, has tremendously helped me this one week. Thanks guyz. -Vijay
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Is the maximum number of charges on a single input record fixed? If so, how many can there be?

If the answer is yes and the maximum number is small you could do it with a single transformer, one output link for each possible charge.

Obviously this won't work if the maximum number of charges is fixed and would be an inelegant soultion if the maximum number of charges were high.

Try searching the forum for pivot.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

Yes.. the maximum charge bucket in a record is 10. But not necessarily, all 10 charge buckets would be filled in.
thompsonp wrote:Is the maximum number of charges on a single input record fixed? If so, how many can there be?

If the answer is yes and the maximum number is small you could do it with a single transformer, one output link for each possible charge.

Obviously this won't work if the maximum number of charges is fixed and would be an inelegant soultion if the maximum number of charges were high.

Try searching the forum for pivot.
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

As suggested PIVOT stage can get you your required output.
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

Sunshine2323 wrote:As suggested PIVOT stage can get you your required output.
Is this a Parallel job Stage ??
makreddy
Participant
Posts: 21
Joined: Wed Sep 14, 2005 10:40 pm
Location: hyderabad
Contact:

Splitting One Record to Many

Post by makreddy »

Hi,
This cab be achieved through costomised buildop stage.

Thanks
MAKREDDY
Aravind
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

vijayrc wrote:Is this a Parallel job Stage ??
Yes
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

Sunshine2323 wrote:
vijayrc wrote:Is this a Parallel job Stage ??
Yes
Thank you all, got this working with a Transformer stage, but would try with a Pivot stage later on, as Transformer stage seems to be overhead w.r.t perfomance for this task. Will keep posted. Thanks again
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

I highly recommend using buildops - they tend to be a bit faster and have some good array structures and looping conditions for handling this type of situation.

Subrecs and Vectors

We get a lot of record layouts with occurs. Based on the portion of the copybook below, here's what we have done. First, create the schema either by hand or import the copybook itself in Manager with Import/Table Definitions/COBOL File Definitions.

Sample layout:

Code: Select all

      01 DSXCHANGE-RECORD.
           10 Data1 PIC X(2). 
           10 DataGroup OCCURS 10 TIMES. 
              15 Data2 PIC S9(11) COMP-3. 
              15 Data3 PIC S9(9)V99 COMP-3. 
              15 Data4 PIC S9(9)V99 COMP-3. 
              15 Data5 PIC S9(11) COMP-3. 
           10 Data6 PIC X(2).
The schema should look like this:

Code: Select all

record { record_format = fixed,
                         native_endian, ebcdic, binary, delim = none } (
    Data1:string[2];
    DataGroup[300]:subrec (
        Data2:decimal[11,0];
        Data3:decimal[11,2];
        Data4:decimal[11,2];
        Data5:decimal[11,0];
    );
    Data6:string[2];
)
While the table definition looks slightly different whether you import the copybook itself or a prepared schema, the end result is the same. Importing the copybook, the definition will show a column with the name DATAGROUP with 10 in the occurs column. Using the imported schema, it will show up as a vector. Same thing.

Once imported, you can use the Split Subrecord stage to create individual vectors of the each field in the subrec (so Data1 - Data5 will each be a vector rather that part of one multi-field vector.

My suggestion is split it right away. We have noticed that some stages (like Modify) do not like subrec vectors but handle multiple single-field vectors just fine.

Handling vectors in a buildop (just in case you're interested)

By the way, vectors are easy to manipulate in a buildop. I am sure there are ways to do your transformations in a Transform stage, but we only use buildops here. So I'll show you how we do it and let someone else show the Transform method.

Your input schema for the buildop will be the output schema of the preceding stage (I think there are a lot of posting on how to capture that, just do a quick search - if you can't find anything let me know). Create your output schema to match the target layout.

You per-record code will look something like this:

Code: Select all

// Drop default record, output handled manually 
dropRecord();      
                   
// DSXCHANGE-RECORD occurs 10 times. 
// - Assume Data2 is some kind of indicator and only create new 
//   record if Data2 is not NULL or spaces 

occurs_lmt = 10; 
occurs_cnt = 0; 

for (int i=0; i<occurs_lmt; i++) { 
    if ( in.Data2_null[i] ) {           
        // Record is empty, so skip it 
        continue;  
    }              

    // Move occurs fields...
    out.Data2 = in.Data2;
    out.Data3 = in.Data3;
    out.Data4 = in.Data4;
    out.Data5 = in.Data5;
                   
    // Move non-occurence fields 
    out.Data1 = in.Data1;
    out.Data6 = in.Data6;
                   
    // Write record 
    transferRecord(); 
    outputRecord(); 
                   
    // Increment occurs counter
    occurs_cnt++;
}

Well, hope this helps. Sorry it is so long, probably more than you wanted. I would imagine that there are ways to dynamically send in the array size, but why bother if the file layout is static?

Brad.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

Thanks bcarlson for the detailed explanation. I'm gonna try this out. But being a newbie, with just a 4day training and 1week hands-on so far in my current project, I'm not sure, if I can crack it out. But would definitely give it a shot. THANX again for the help. -V
bcarlson wrote:I highly recommend using buildops - they tend to be a bit faster and have some good array structures and looping conditions for handling this type of situation.

Subrecs and Vectors

We get a lot of record layouts with occurs. Based on the portion of the copybook below, here's what we have done. First, create the schema either by hand or import the copybook itself in Manager with Import/Table Definitions/COBOL File Definitions.

Sample layout:

Code: Select all

      01 DSXCHANGE-RECORD.
           10 Data1 PIC X(2). 
           10 DataGroup OCCURS 10 TIMES. 
              15 Data2 PIC S9(11) COMP-3. 
              15 Data3 PIC S9(9)V99 COMP-3. 
              15 Data4 PIC S9(9)V99 COMP-3. 
              15 Data5 PIC S9(11) COMP-3. 
           10 Data6 PIC X(2).
The schema should look like this:

Code: Select all

record { record_format = fixed,
                         native_endian, ebcdic, binary, delim = none } (
    Data1:string[2];
    DataGroup[300]:subrec (
        Data2:decimal[11,0];
        Data3:decimal[11,2];
        Data4:decimal[11,2];
        Data5:decimal[11,0];
    );
    Data6:string[2];
)
While the table definition looks slightly different whether you import the copybook itself or a prepared schema, the end result is the same. Importing the copybook, the definition will show a column with the name DATAGROUP with 10 in the occurs column. Using the imported schema, it will show up as a vector. Same thing.

Once imported, you can use the Split Subrecord stage to create individual vectors of the each field in the subrec (so Data1 - Data5 will each be a vector rather that part of one multi-field vector.

My suggestion is split it right away. We have noticed that some stages (like Modify) do not like subrec vectors but handle multiple single-field vectors just fine.

Handling vectors in a buildop (just in case you're interested)

By the way, vectors are easy to manipulate in a buildop. I am sure there are ways to do your transformations in a Transform stage, but we only use buildops here. So I'll show you how we do it and let someone else show the Transform method.

Your input schema for the buildop will be the output schema of the preceding stage (I think there are a lot of posting on how to capture that, just do a quick search - if you can't find anything let me know). Create your output schema to match the target layout.

You per-record code will look something like this:

Code: Select all

// Drop default record, output handled manually 
dropRecord();      
                   
// DSXCHANGE-RECORD occurs 10 times. 
// - Assume Data2 is some kind of indicator and only create new 
//   record if Data2 is not NULL or spaces 

occurs_lmt = 10; 
occurs_cnt = 0; 

for (int i=0; i<occurs_lmt; i++) { 
    if ( in.Data2_null[i] ) {           
        // Record is empty, so skip it 
        continue;  
    }              

    // Move occurs fields...
    out.Data2 = in.Data2;
    out.Data3 = in.Data3;
    out.Data4 = in.Data4;
    out.Data5 = in.Data5;
                   
    // Move non-occurence fields 
    out.Data1 = in.Data1;
    out.Data6 = in.Data6;
                   
    // Write record 
    transferRecord(); 
    outputRecord(); 
                   
    // Increment occurs counter
    occurs_cnt++;
}

Well, hope this helps. Sorry it is so long, probably more than you wanted. I would imagine that there are ways to dynamically send in the array size, but why bother if the file layout is static?

Brad.
Post Reply