Aggregation/Summation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Aggregation/Summation

Post by bond88 »

Hi all,
Could you please help me in creating this job?

Input

Project_ID-----Year-----month-----code-----amount
1 ---------------2011----1------------10-------100
1----------------2011----1------------20-------95
1----------------2011----1------------30-------80
2----------------2011----1------------20-------500
2----------------2011----1------------30-------800

and so on up to
1----------------2011----12----------20-------70
so on....
2 ---------------2012----10------------10-------1500
2----------------2012----10------------20-------2000
1 ---------------2013----12------------10-------500
1----------------2013----12------------20-------900
1----------------2013----12------------30-------1000


Output:
Project_ID-----Year-----month-----------amount
2----------------2012----10-------------------3500
1----------------2013----12-------------------2400


Output needs to be one record for each project (latest record, year and month if that month has multiple records with different codes, it needs to be sum up in to a single value)

Thank you,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Would this not just need an Aggregator? Group on Project ID, year, month and sum the amount.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Thanks Chulett,
You are right ! If we use aggregator stage and group by on project_id, year and month and I used Aggregation type = calculation, column for calculation = amount, sum of output column = amount_sum

But I am getting all the years and corresponding months. How could I get the latest year and month in the particular project group. For project_id I need the latest record.

Thanks,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Max(year/month) comes to mind but that won't get you the appropriate sum. Seems to me you'll need to find a way to send only those max records to the Aggregator. Sort descending with a key change column and take the first group per year, perhaps.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Sorry Chulett,
I didn't understand completely. Before feeding to aggregator stage if we use sort stage (descending) with key change column, how could I feed only the first group of each project_id to aggregator stage? Any suggestion please.

Thanks,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

One way - stage variables to take the first group per project / year, hence the suggestion to sort descending.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Correct me if this approach is wrong/lengthy one.
First I can sort descending by project_id and fiscal_year along with key change column and if I filter on keychange=1 then I will get one row for each year for particular project_id and after that if I use sort stage again to sort by project_id along with key column change and then if I use filter stage to filter keychange=1 this time I will get one record (latest) per each project_id.
Bhanu
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
After sorting,you can use transformer.below is the logic:-

Code: Select all

Svar=If keychange=1 then Inputcol.Year else Svar
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
After sorting,you can use transformer.write below logic in stage variable:-

Code: Select all

Svar=If keychange=1 then Inputcol.Year else Svar
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Post by srinivas.nettalam »

OR a lengthy approach...

1-Split the input to two streams.
2-Find max year and month in the output1
3- join the two outputs(Fork join) on projectid,month and year and then find sum.
N.Srinivas
India.
Post Reply