Is Incremental Aggregation possible
Moderators: chulett, rschirm, roy
Is Incremental Aggregation possible
Can any one guide me for creating a job aggregating (taking avg) of a column in parts.
More in Detail ;-
name, qty1
a,5
a,3
r,6
y,2
i,5
q,3
o,4
t,3
Now I want to aggregate for every 5 rows, so the final ans is
Value
21
10
Anyhelp is highly appreciated...
Kommu
More in Detail ;-
name, qty1
a,5
a,3
r,6
y,2
i,5
q,3
o,4
t,3
Now I want to aggregate for every 5 rows, so the final ans is
Value
21
10
Anyhelp is highly appreciated...
Kommu
-
- Charter Member
- Posts: 143
- Joined: Thu Nov 04, 2004 6:53 am
Re: Is Incremental Aggregation possible
Hi are you trying to do average of values where name=a or r or y into one column and where name =q or o or t into another column??kommven wrote:Can any one guide me for creating a job aggregating (taking avg) of a column in parts.
More in Detail ;-
name, qty1
a,5
a,3
r,6
y,2
i,5
q,3
o,4
t,3
Now I want to aggregate for every 5 rows, so the final ans is
Value
21
10
Anyhelp is highly appreciated...
Kommu
sorry i didnt understand your question??
-
- Charter Member
- Posts: 143
- Joined: Thu Nov 04, 2004 6:53 am
Re: Is Incremental Aggregation possible
oops! sorry now i read your question properly
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
You can create an additional column derived as int(@Outrownum / 5) and group on this column.
So it will become
newcol, name, qty1
0, a,5
0, a,3
0, r,6
0, y,2
0, i,5
1, q,3
1, o,4
1, t,3
and after grouping (and applying sum(qty1)) you will have
newcol, sum_qty1
0, 21
1, 10
which I believe is what you want.
So it will become
newcol, name, qty1
0, a,5
0, a,3
0, r,6
0, y,2
0, i,5
1, q,3
1, o,4
1, t,3
and after grouping (and applying sum(qty1)) you will have
newcol, sum_qty1
0, 21
1, 10
which I believe is what you want.
In your example, it is a sum and not an avg...
Anyway, this might do the trick :
In a transformer before the aggregator add a column to your stream containing the value Int (@OUTROWNUM / 5)
Like this, your data now looks like :
a,5,0
a,3,0
r,6,0
y,2,0
i,5,0
q,3,1
o,4,1
t,3,1
If you then sum (or avg) based on the third column, you can obtain what you want...
Anyway, this might do the trick :
In a transformer before the aggregator add a column to your stream containing the value Int (@OUTROWNUM / 5)
Like this, your data now looks like :
a,5,0
a,3,0
r,6,0
y,2,0
i,5,0
q,3,1
o,4,1
t,3,1
If you then sum (or avg) based on the third column, you can obtain what you want...
The MOD trick doesn't work. You need to create a new column to be a grouping key. Hopefully, your data is ordered so all you have to do is use a stage variables in a transformer. The stage variable is a GROUPKEY that starts at 1 and only increments every fifth row. Output to an aggregator and group by your new column.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
The MOD on @OUTROWNUM trick will not work. You will only get 5 rows output, a 0, 1, 2, 3, and 4 row. If you have 500, 50000, or 5000000 source rows, you'll only get 5 rows of output.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Charter Member
- Posts: 143
- Joined: Thu Nov 04, 2004 6:53 am
Hi,
If you have only one stream link comming into transformer then
declare 3 stage variables
1) RowProcGetPreviuosValue(qty1)= stagevar1
2) currentvalue(qty1)=stagevar2
3) If (@INROWNUM/5)!=0 then stagevar2+stagevar3 else stagevar2=stagevar3
4) If @INROWNUM/5=0 then (stagevar3)/5 else 0-----stagevar4
Give this constraint (@INROWNUM/5=0), so that only averages are populated into target table
derivation of your target table average column=stagevar4
If you have a reference link along with stream link then you have to do all this in second transformer.
This works and also you can avoid using aggregator stage.
Thank You
If you have only one stream link comming into transformer then
declare 3 stage variables
1) RowProcGetPreviuosValue(qty1)= stagevar1
2) currentvalue(qty1)=stagevar2
3) If (@INROWNUM/5)!=0 then stagevar2+stagevar3 else stagevar2=stagevar3
4) If @INROWNUM/5=0 then (stagevar3)/5 else 0-----stagevar4
Give this constraint (@INROWNUM/5=0), so that only averages are populated into target table
derivation of your target table average column=stagevar4
If you have a reference link along with stream link then you have to do all this in second transformer.
This works and also you can avoid using aggregator stage.
Thank You
kcbland, it is not a mod(), but a div...
I you truncate the resulting number, you will have the right group (truncate and not round)...
1 / 5 = 0.20 int(0.20) = 0
2 / 5 = 0.40 int(0.40) = 0
3 / 5 = 0.60 int(0.40) = 0
4 / 5 = 0.80 int(0.40) = 0
5 / 5 = 1.00 int(0.40) = 1
6 / 5 = 1.20 int(0.40) = 1
...
Yes, you have to create a new column, but it will stay only between the transformer and the aggregator...
I think it is more efficient to use (proc time, not memory) than the method vcannadevula suggested...
P.S. I just saw that the first group contains only 4 values... In order to get the result in the original example, you have to modify the column to Int((@OUTROWNUM - 0.1) / 5)
I you truncate the resulting number, you will have the right group (truncate and not round)...
1 / 5 = 0.20 int(0.20) = 0
2 / 5 = 0.40 int(0.40) = 0
3 / 5 = 0.60 int(0.40) = 0
4 / 5 = 0.80 int(0.40) = 0
5 / 5 = 1.00 int(0.40) = 1
6 / 5 = 1.20 int(0.40) = 1
...
Yes, you have to create a new column, but it will stay only between the transformer and the aggregator...
I think it is more efficient to use (proc time, not memory) than the method vcannadevula suggested...
P.S. I just saw that the first group contains only 4 values... In order to get the result in the original example, you have to modify the column to Int((@OUTROWNUM - 0.1) / 5)
Be aware that if you do it all within the transformer, you run the risk of "losing" your last output row (in the case where your number of input rows is not an even multiple of 5). When doing an aggregation entirely within a transformer, you usually need to detect an "End Of File" condition so that you can output that last row.
Mike
Mike
The AGGREGATOR stage. If you insist on stage variables, then you are going to need to add a last row to your input datastream to trigger the last output group.kommven wrote:So any built-in stage in DS?
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle