Splitting One Record to Many

vijayrc · Post by **vijayrc** » Wed Apr 12, 2006 5:04 pm

Hi,
I have a file with records that carry few occurences[10] of Charges.
[1] For each of these input records with charges as an array [1-10], how do I create 10 ouput records with Charge1 in record1, charge2 in record2 and so on. What stage[s] is the best ?
[2] Again not all input records would carry all 10 charges, it might be 7, 6 or 9 charges, and this no of charges is known by another column. So if an input record has 4 charges, how do I limit to just writing 4 output records with Charge1, charge2, charge3, charge4 and not write records for zero charges.
Input:
AAAAAAAAA 5 $10 $9 $8 $7 $6 blank blank blank blank blank
Output
AAAAAAAAA $10
AAAAAAAAA $9
AAAAAAAAA $8
AAAAAAAAA $7
AAAAAAAAA $6
Just learning, and the tips learnt here, has tremendously helped me this one week. Thanks guyz. -Vijay

vijayrc · Post by **vijayrc** » Thu Apr 13, 2006 4:33 am

Any help on this is appreciated...Please..!!
Thanks, V

vijayrc wrote:Hi,
I have a file with records that carry few occurences[10] of Charges.
[1] For each of these input records with charges as an array [1-10], how do I create 10 ouput records with Charge1 in record1, charge2 in record2 and so on. What stage[s] is the best ?
[2] Again not all input records would carry all 10 charges, it might be 7, 6 or 9 charges, and this no of charges is known by another column. So if an input record has 4 charges, how do I limit to just writing 4 output records with Charge1, charge2, charge3, charge4 and not write records for zero charges.
Input:
AAAAAAAAA 5 $10 $9 $8 $7 $6 blank blank blank blank blank
Output
AAAAAAAAA $10
AAAAAAAAA $9
AAAAAAAAA $8
AAAAAAAAA $7
AAAAAAAAA $6
Just learning, and the tips learnt here, has tremendously helped me this one week. Thanks guyz. -Vijay

thompsonp · Post by **thompsonp** » Thu Apr 13, 2006 4:40 am

Is the maximum number of charges on a single input record fixed? If so, how many can there be?

If the answer is yes and the maximum number is small you could do it with a single transformer, one output link for each possible charge.

Obviously this won't work if the maximum number of charges is fixed and would be an inelegant soultion if the maximum number of charges were high.

Try searching the forum for pivot.

vijayrc · Post by **vijayrc** » Thu Apr 13, 2006 5:51 am

Yes.. the maximum charge bucket in a record is 10. But not necessarily, all 10 charge buckets would be filled in.

thompsonp wrote:Is the maximum number of charges on a single input record fixed? If so, how many can there be?

If the answer is yes and the maximum number is small you could do it with a single transformer, one output link for each possible charge.

Obviously this won't work if the maximum number of charges is fixed and would be an inelegant soultion if the maximum number of charges were high.

Try searching the forum for pivot.

Sunshine2323 · Post by **Sunshine2323** » Thu Apr 13, 2006 8:34 am

As suggested PIVOT stage can get you your required output.

vijayrc · Post by **vijayrc** » Thu Apr 13, 2006 8:14 pm

Sunshine2323 wrote:As suggested PIVOT stage can get you your required output.

Is this a Parallel job Stage ??

makreddy · Post by **makreddy** » Fri Apr 14, 2006 2:03 am

Hi,
This cab be achieved through costomised buildop stage.

Thanks
MAKREDDY

Sunshine2323 · Post by **Sunshine2323** » Fri Apr 14, 2006 5:52 am

vijayrc wrote:Is this a Parallel job Stage ??

Yes

vijayrc · Post by **vijayrc** » Sat Apr 15, 2006 12:38 pm

Sunshine2323 wrote:
vijayrc wrote:Is this a Parallel job Stage ??
Yes

Thank you all, got this working with a Transformer stage, but would try with a Pivot stage later on, as Transformer stage seems to be overhead w.r.t perfomance for this task. Will keep posted. Thanks again

bcarlson · Post by **bcarlson** » Tue Apr 18, 2006 2:52 pm

I highly recommend using buildops - they tend to be a bit faster and have some good array structures and looping conditions for handling this type of situation.

Subrecs and Vectors

We get a lot of record layouts with occurs. Based on the portion of the copybook below, here's what we have done. First, create the schema either by hand or import the copybook itself in Manager with Import/Table Definitions/COBOL File Definitions.

Sample layout:

Code: Select all

      01 DSXCHANGE-RECORD.
           10 Data1 PIC X(2). 
           10 DataGroup OCCURS 10 TIMES. 
              15 Data2 PIC S9(11) COMP-3. 
              15 Data3 PIC S9(9)V99 COMP-3. 
              15 Data4 PIC S9(9)V99 COMP-3. 
              15 Data5 PIC S9(11) COMP-3. 
           10 Data6 PIC X(2).

The schema should look like this:

Code: Select all

record { record_format = fixed,
                         native_endian, ebcdic, binary, delim = none } (
    Data1:string[2];
    DataGroup[300]:subrec (
        Data2:decimal[11,0];
        Data3:decimal[11,2];
        Data4:decimal[11,2];
        Data5:decimal[11,0];
    );
    Data6:string[2];
)

While the table definition looks slightly different whether you import the copybook itself or a prepared schema, the end result is the same. Importing the copybook, the definition will show a column with the name DATAGROUP with 10 in the occurs column. Using the imported schema, it will show up as a vector. Same thing.

Once imported, you can use the Split Subrecord stage to create individual vectors of the each field in the subrec (so Data1 - Data5 will each be a vector rather that part of one multi-field vector.

My suggestion is split it right away. We have noticed that some stages (like Modify) do not like subrec vectors but handle multiple single-field vectors just fine.

Handling vectors in a buildop (just in case you're interested)

By the way, vectors are easy to manipulate in a buildop. I am sure there are ways to do your transformations in a Transform stage, but we only use buildops here. So I'll show you how we do it and let someone else show the Transform method.

Your input schema for the buildop will be the output schema of the preceding stage (I think there are a lot of posting on how to capture that, just do a quick search - if you can't find anything let me know). Create your output schema to match the target layout.

You per-record code will look something like this:

Code: Select all

// Drop default record, output handled manually 
dropRecord();      
                   
// DSXCHANGE-RECORD occurs 10 times. 
// - Assume Data2 is some kind of indicator and only create new 
//   record if Data2 is not NULL or spaces 

occurs_lmt = 10; 
occurs_cnt = 0; 

for (int i=0; i<occurs_lmt; i++) { 
    if ( in.Data2_null[i] ) {           
        // Record is empty, so skip it 
        continue;  
    }              

    // Move occurs fields...
    out.Data2 = in.Data2;
    out.Data3 = in.Data3;
    out.Data4 = in.Data4;
    out.Data5 = in.Data5;
                   
    // Move non-occurence fields 
    out.Data1 = in.Data1;
    out.Data6 = in.Data6;
                   
    // Write record 
    transferRecord(); 
    outputRecord(); 
                   
    // Increment occurs counter
    occurs_cnt++;
}

Well, hope this helps. Sorry it is so long, probably more than you wanted. I would imagine that there are ways to dynamically send in the array size, but why bother if the file layout is static?

Brad.

vijayrc · Post by **vijayrc** » Tue Apr 18, 2006 5:46 pm

Thanks bcarlson for the detailed explanation. I'm gonna try this out. But being a newbie, with just a 4day training and 1week hands-on so far in my current project, I'm not sure, if I can crack it out. But would definitely give it a shot. THANX again for the help. -V

bcarlson wrote:I highly recommend using buildops - they tend to be a bit faster and have some good array structures and looping conditions for handling this type of situation.

Subrecs and Vectors

We get a lot of record layouts with occurs. Based on the portion of the copybook below, here's what we have done. First, create the schema either by hand or import the copybook itself in Manager with Import/Table Definitions/COBOL File Definitions.

Sample layout:
Code: Select all
      01 DSXCHANGE-RECORD.
           10 Data1 PIC X(2). 
           10 DataGroup OCCURS 10 TIMES. 
              15 Data2 PIC S9(11) COMP-3. 
              15 Data3 PIC S9(9)V99 COMP-3. 
              15 Data4 PIC S9(9)V99 COMP-3. 
              15 Data5 PIC S9(11) COMP-3. 
           10 Data6 PIC X(2).
The schema should look like this:
Code: Select all
record { record_format = fixed,
                         native_endian, ebcdic, binary, delim = none } (
    Data1:string[2];
    DataGroup[300]:subrec (
        Data2:decimal[11,0];
        Data3:decimal[11,2];
        Data4:decimal[11,2];
        Data5:decimal[11,0];
    );
    Data6:string[2];
)
While the table definition looks slightly different whether you import the copybook itself or a prepared schema, the end result is the same. Importing the copybook, the definition will show a column with the name DATAGROUP with 10 in the occurs column. Using the imported schema, it will show up as a vector. Same thing.

Once imported, you can use the Split Subrecord stage to create individual vectors of the each field in the subrec (so Data1 - Data5 will each be a vector rather that part of one multi-field vector.

My suggestion is split it right away. We have noticed that some stages (like Modify) do not like subrec vectors but handle multiple single-field vectors just fine.

Handling vectors in a buildop (just in case you're interested)

By the way, vectors are easy to manipulate in a buildop. I am sure there are ways to do your transformations in a Transform stage, but we only use buildops here. So I'll show you how we do it and let someone else show the Transform method.

Your input schema for the buildop will be the output schema of the preceding stage (I think there are a lot of posting on how to capture that, just do a quick search - if you can't find anything let me know). Create your output schema to match the target layout.

You per-record code will look something like this:
Code: Select all
// Drop default record, output handled manually 
dropRecord();      
                   
// DSXCHANGE-RECORD occurs 10 times. 
// - Assume Data2 is some kind of indicator and only create new 
//   record if Data2 is not NULL or spaces 

occurs_lmt = 10; 
occurs_cnt = 0; 

for (int i=0; i<occurs_lmt; i++) { 
    if ( in.Data2_null[i] ) {           
        // Record is empty, so skip it 
        continue;  
    }              

    // Move occurs fields...
    out.Data2 = in.Data2;
    out.Data3 = in.Data3;
    out.Data4 = in.Data4;
    out.Data5 = in.Data5;
                   
    // Move non-occurence fields 
    out.Data1 = in.Data1;
    out.Data6 = in.Data6;
                   
    // Write record 
    transferRecord(); 
    outputRecord(); 
                   
    // Increment occurs counter
    occurs_cnt++;
}
Well, hope this helps. Sorry it is so long, probably more than you wanted. I would imagine that there are ways to dynamically send in the array size, but why bother if the file layout is static?

Brad.

DSXchange