Is Incremental Aggregation possible

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

kommven
Charter Member
Charter Member
Posts: 125
Joined: Mon Jul 12, 2004 12:37 pm

Is Incremental Aggregation possible

Post by kommven »

Can any one guide me for creating a job aggregating (taking avg) of a column in parts.

More in Detail ;-

name, qty1
a,5
a,3
r,6
y,2
i,5
q,3
o,4
t,3

Now I want to aggregate for every 5 rows, so the final ans is

Value
21
10

Anyhelp is highly appreciated...

Kommu
vcannadevula
Charter Member
Charter Member
Posts: 143
Joined: Thu Nov 04, 2004 6:53 am

Re: Is Incremental Aggregation possible

Post by vcannadevula »

kommven wrote:Can any one guide me for creating a job aggregating (taking avg) of a column in parts.

More in Detail ;-

name, qty1
a,5
a,3
r,6
y,2
i,5
q,3
o,4
t,3

Now I want to aggregate for every 5 rows, so the final ans is

Value
21
10

Anyhelp is highly appreciated...

Kommu
Hi are you trying to do average of values where name=a or r or y into one column and where name =q or o or t into another column??
sorry i didnt understand your question??
vcannadevula
Charter Member
Charter Member
Posts: 143
Joined: Thu Nov 04, 2004 6:53 am

Re: Is Incremental Aggregation possible

Post by vcannadevula »

oops! sorry now i read your question properly
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can create an additional column derived as int(@Outrownum / 5) and group on this column.

So it will become

newcol, name, qty1
0, a,5
0, a,3
0, r,6
0, y,2
0, i,5
1, q,3
1, o,4
1, t,3

and after grouping (and applying sum(qty1)) you will have
newcol, sum_qty1
0, 21
1, 10

which I believe is what you want.
Alexandre
Participant
Posts: 12
Joined: Fri Dec 31, 2004 5:04 am
Location: Paris
Contact:

Post by Alexandre »

In your example, it is a sum and not an avg...

Anyway, this might do the trick :
In a transformer before the aggregator add a column to your stream containing the value Int (@OUTROWNUM / 5)
Like this, your data now looks like :

a,5,0
a,3,0
r,6,0
y,2,0
i,5,0
q,3,1
o,4,1
t,3,1

If you then sum (or avg) based on the third column, you can obtain what you want...
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The MOD trick doesn't work. You need to create a new column to be a grouping key. Hopefully, your data is ordered so all you have to do is use a stage variables in a transformer. The stage variable is a GROUPKEY that starts at 1 and only increments every fifth row. Output to an aggregator and group by your new column.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kommven
Charter Member
Charter Member
Posts: 125
Joined: Mon Jul 12, 2004 12:37 pm

Post by kommven »

Thanks for the help, I am trying @rownum

Kommu
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The MOD on @OUTROWNUM trick will not work. You will only get 5 rows output, a 0, 1, 2, 3, and 4 row. If you have 500, 50000, or 5000000 source rows, you'll only get 5 rows of output.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vcannadevula
Charter Member
Charter Member
Posts: 143
Joined: Thu Nov 04, 2004 6:53 am

Post by vcannadevula »

Hi,
If you have only one stream link comming into transformer then
declare 3 stage variables

1) RowProcGetPreviuosValue(qty1)= stagevar1
2) currentvalue(qty1)=stagevar2
3) If (@INROWNUM/5)!=0 then stagevar2+stagevar3 else stagevar2=stagevar3
4) If @INROWNUM/5=0 then (stagevar3)/5 else 0-----stagevar4

Give this constraint (@INROWNUM/5=0), so that only averages are populated into target table
derivation of your target table average column=stagevar4
If you have a reference link along with stream link then you have to do all this in second transformer.
This works and also you can avoid using aggregator stage.

Thank You
Alexandre
Participant
Posts: 12
Joined: Fri Dec 31, 2004 5:04 am
Location: Paris
Contact:

Post by Alexandre »

kcbland, it is not a mod(), but a div...
I you truncate the resulting number, you will have the right group (truncate and not round)...

1 / 5 = 0.20 int(0.20) = 0
2 / 5 = 0.40 int(0.40) = 0
3 / 5 = 0.60 int(0.40) = 0
4 / 5 = 0.80 int(0.40) = 0
5 / 5 = 1.00 int(0.40) = 1
6 / 5 = 1.20 int(0.40) = 1
...

Yes, you have to create a new column, but it will stay only between the transformer and the aggregator...

I think it is more efficient to use (proc time, not memory) than the method vcannadevula suggested...

P.S. I just saw that the first group contains only 4 values... In order to get the result in the original example, you have to modify the column to Int((@OUTROWNUM - 0.1) / 5)
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Be aware that if you do it all within the transformer, you run the risk of "losing" your last output row (in the case where your number of input rows is not an even multiple of 5). When doing an aggregation entirely within a transformer, you usually need to detect an "End Of File" condition so that you can output that last row.

Mike
Alexandre
Participant
Posts: 12
Joined: Fri Dec 31, 2004 5:04 am
Location: Paris
Contact:

Post by Alexandre »

It is a very good point, Mike :
Using Stage variables, you have to be very very carefull with the last line (or the first depending on the case)...

Usually, it try to avoid them for this kind of things...
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The Aggregator solution is the best one given the situation.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kommven
Charter Member
Charter Member
Posts: 125
Joined: Mon Jul 12, 2004 12:37 pm

Post by kommven »

I like to use stage variables and I will implement it and let you guys know... All are of great help. Thank's everyone in person...

I heard that informatica has a built in module for handelling incremental aggregations.

So any built-in stage in DS?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

kommven wrote:So any built-in stage in DS?
The AGGREGATOR stage. If you insist on stage variables, then you are going to need to add a last row to your input datastream to trigger the last output group.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply