Aggregator Performance

raju4u · Post by **raju4u** » Wed Sep 07, 2011 10:19 pm

Hi,

In the job we are giving 19 crore data to aggregator stage,it is taking 3 hrs time.here we are giving sorted data and hash partitioned data to agg and method in agg is sort method..please let me know if i can reduce the time in any other manner..

Thanks,
Rajashekar.

SURA · Post by **SURA** » Wed Sep 07, 2011 11:41 pm

How about the data volume. no of columns used to aggregation?

Find out where the time is consumed more?

Split the job may help to reduce the time.

DS User

ray.wurlod · Post by **ray.wurlod** » Thu Sep 08, 2011 12:27 am

Please advise what the grouping columns for aggregation are.

Essentially, though, you need to partition on the first only of these (unless it has very few distinct values) and sort on all of them in order, to be able use Sort as the aggregation method.

keshav0307 · Post by **keshav0307** » Tue Sep 13, 2011 2:53 am

did you try increase number of nodes

kommven · Post by **kommven** » Tue Sep 13, 2011 9:50 am

Compare with a simple select Job Vs Aggregator in Job.
I assume the throughput from Source stage is a well to note measure in depicting overall performance of your Job.

I will also suggest dumping the data into dataset and using that as a source to compare your results and see if there is any improvement oppurtunity.

DSXchange

Aggregator Performance

Aggregator Performance

Re: Aggregator Performance