Footer record for each group by value

ravireddy25 · Post by **ravireddy25** » Wed Oct 27, 2010 4:29 am

Hi,
I need to implement a scenario and pass it to a file with one header and multiple footer records like
Input:
emp_id
100
100
200
200
200
300

Output:
Header--------------
100
2(count)
200
3(count)
300
1(count)

Your suggestions is appreciated!
Thanks in Advance

ArndW · Post by **ArndW** » Wed Oct 27, 2010 6:18 am

Declare your output format as a single VarChar column. In your job, detect the value group changes using either stage variables in a transform or within a sort stage. When your job detects a group change it outputs an additional record with the footer values as indicated in your example.

swapnilverma · Post by **swapnilverma** » Wed Oct 27, 2010 6:41 am

Hi Ravi,

Sort on EMP ID

Use Sort Stage again do create change key cluster column

Result will be as below

Use transformer --- stage variables ---

Change_key emp_id
1 100
0 100
1 200
0 200
0 200
1 300

Now use stage variables in transformer to group same records.

Let me know if you need more details.

ravireddy25 wrote:Hi,
I need to implement a scenario and pass it to a file with one header and multiple footer records like
Input:
emp_id
100
100
200
200
200
300

Output:
Header--------------
100
2(count)
200
3(count)
300
1(count)

Your suggestions is appreciated!
Thanks in Advance

ravireddy25 · Post by **ravireddy25** » Wed Oct 27, 2010 11:27 pm

ArndW wrote:Declare your output format as a single VarChar column. In your job, detect the value group changes using either stage variables in a transform or within a sort stage. When your job detects a group change it outputs an additional record with the footer values as indicated in your example.

Hi,
Thanks for your reply,i can take single varchar column but how to detect the value group by sort stage or in transformers and how an additional record with count will be added after each group, can you explain in detail plz

Thanks in advance

stuartjvnorton · Post by **stuartjvnorton** » Wed Oct 27, 2010 11:50 pm

You could have multiple ordered sources with the following format:
Key1 Key2 RowContents

Src 1 (header):
-1 0 Header-----------------

The -1 is just to make sure the header comes first. You could have another key that comes first in the order that does nothing more than order the sources (eg: 0 for the header and 1 for the other 2).

The other sources come from the data.
Start with an Aggregator to get the values (eg: 100, 200, etc) and counts, then run it through a Transformer to create the split.

Src2 (1 Transformer output to create the value lines)
Inp.Value 0 Inp.Value

Src3 (2nd Transformar output to create the count lines)
Inp.Value 1 Inp.Count : "(count)"

Now take the 3 sources and funnel them, then sort on Key1 asc, Key2 asc.
For Src2 and Src3, Key1 uses the Value out of the Aggregator, and Key2 is an artificial value to make sure that for each value, the value line is ordered before the count line.

I'm sure there are better ways to do this, but this is one way that would work.

mobashshar · Post by **mobashshar** » Wed Oct 27, 2010 11:55 pm

Easy Solution using 2 jobs:

Job 1: Read the input file, process you data and write the Header Row in File1.
Job 2: Read the input file and use Aggregator stage using count rows for key column. This will give you total number of rows for every group (in your case emp_id). Now write this into File1 created in Job 1. While writing make sure to use Append to existing File1 otherwise it will wipe out your previously written header row.

Note: You can do the same thing in one job by controlling the flow ( first write the header and then write count) by link odering in transformer but in parallel mode it is not advisable.

ravireddy25 · Post by **ravireddy25** » Thu Oct 28, 2010 12:24 am

stuartjvnorton wrote:You could have multiple ordered sources with the following format:
Key1 Key2 RowContents

Src 1 (header):
-1 0 Header-----------------

The -1 is just to make sure the header comes first. You could have another key that comes first in the order that does nothing more than order the sources (eg: 0 for the header and 1 for the other 2).

The other sources come from the data.
Start with an Aggregator to get the values (eg: 100, 200, etc) and counts, then run it through a Transformer to create the split.

Src2 (1 Transformer output to create the value lines)
Inp.Value 0 Inp.Value

Src3 (2nd Transformar output to create the count lines)
Inp.Value 1 Inp.Count : "(count)"

Now take the 3 sources and funnel them, then sort on Key1 asc, Key2 asc.
For Src2 and Src3, Key1 uses the Value out of the Aggregator, and Key2 is an artificial value to make sure that for each value, the value line is ordered before the count line.

I'm sure there are better ways to do this, but this is one way that would work.

Hi Guys,
Thanks I will try this

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Oct 28, 2010 2:49 am

Why not run the data through an aggregator and include 2 extra dummy columns.
1.) A newline character after the key field
2.) A '(count)' string after the record count field.

That must do the trick.