Footer record for each group by value

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravireddy25
Participant
Posts: 59
Joined: Wed Dec 31, 2008 5:49 am

Footer record for each group by value

Post by ravireddy25 »

Hi,
I need to implement a scenario and pass it to a file with one header and multiple footer records like
Input:
emp_id
100
100
200
200
200
300

Output:
Header--------------
100
2(count)
200
3(count)
300
1(count)

Your suggestions is appreciated!
Thanks in Advance
Ravi
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Declare your output format as a single VarChar column. In your job, detect the value group changes using either stage variables in a transform or within a sort stage. When your job detects a group change it outputs an additional record with the footer values as indicated in your example.
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

Hi Ravi,


Sort on EMP ID

Use Sort Stage again do create change key cluster column

Result will be as below




Use transformer --- stage variables ---




Change_key emp_id
1 100
0 100
1 200
0 200
0 200
1 300


Now use stage variables in transformer to group same records.

Let me know if you need more details.



ravireddy25 wrote:Hi,
I need to implement a scenario and pass it to a file with one header and multiple footer records like
Input:
emp_id
100
100
200
200
200
300

Output:
Header--------------
100
2(count)
200
3(count)
300
1(count)

Your suggestions is appreciated!
Thanks in Advance
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
ravireddy25
Participant
Posts: 59
Joined: Wed Dec 31, 2008 5:49 am

Post by ravireddy25 »

ArndW wrote:Declare your output format as a single VarChar column. In your job, detect the value group changes using either stage variables in a transform or within a sort stage. When your job detects a group change it outputs an additional record with the footer values as indicated in your example.
Hi,
Thanks for your reply,i can take single varchar column but how to detect the value group by sort stage or in transformers and how an additional record with count will be added after each group, can you explain in detail plz

Thanks in advance
Ravi
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You could have multiple ordered sources with the following format:
Key1 Key2 RowContents

Src 1 (header):
-1 0 Header-----------------

The -1 is just to make sure the header comes first. You could have another key that comes first in the order that does nothing more than order the sources (eg: 0 for the header and 1 for the other 2).

The other sources come from the data.
Start with an Aggregator to get the values (eg: 100, 200, etc) and counts, then run it through a Transformer to create the split.

Src2 (1 Transformer output to create the value lines)
Inp.Value 0 Inp.Value

Src3 (2nd Transformar output to create the count lines)
Inp.Value 1 Inp.Count : "(count)"

Now take the 3 sources and funnel them, then sort on Key1 asc, Key2 asc.
For Src2 and Src3, Key1 uses the Value out of the Aggregator, and Key2 is an artificial value to make sure that for each value, the value line is ordered before the count line.

I'm sure there are better ways to do this, but this is one way that would work.
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Easy Solution using 2 jobs:

Job 1: Read the input file, process you data and write the Header Row in File1.
Job 2: Read the input file and use Aggregator stage using count rows for key column. This will give you total number of rows for every group (in your case emp_id). Now write this into File1 created in Job 1. While writing make sure to use Append to existing File1 otherwise it will wipe out your previously written header row.

Note: You can do the same thing in one job by controlling the flow ( first write the header and then write count) by link odering in transformer but in parallel mode it is not advisable.
ravireddy25
Participant
Posts: 59
Joined: Wed Dec 31, 2008 5:49 am

Post by ravireddy25 »

stuartjvnorton wrote:You could have multiple ordered sources with the following format:
Key1 Key2 RowContents

Src 1 (header):
-1 0 Header-----------------

The -1 is just to make sure the header comes first. You could have another key that comes first in the order that does nothing more than order the sources (eg: 0 for the header and 1 for the other 2).

The other sources come from the data.
Start with an Aggregator to get the values (eg: 100, 200, etc) and counts, then run it through a Transformer to create the split.

Src2 (1 Transformer output to create the value lines)
Inp.Value 0 Inp.Value

Src3 (2nd Transformar output to create the count lines)
Inp.Value 1 Inp.Count : "(count)"

Now take the 3 sources and funnel them, then sort on Key1 asc, Key2 asc.
For Src2 and Src3, Key1 uses the Value out of the Aggregator, and Key2 is an artificial value to make sure that for each value, the value line is ordered before the count line.

I'm sure there are better ways to do this, but this is one way that would work.


Hi Guys,
Thanks I will try this :)
Ravi
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Why not run the data through an aggregator and include 2 extra dummy columns.
1.) A newline character after the key field
2.) A '(count)' string after the record count field.

That must do the trick.
Post Reply