aggregator stage, performance is getting effect

suresh_dsx · Post by **suresh_dsx** » Wed Nov 10, 2010 11:57 pm

Hi,

Hoping that using aggregator stage, performance is getting effect in the job. The job is running for 1hour for 1 million records. Near future we will get 5-10 million records.

Code: Select all

dataset --> transformer-->aggregator stage
                                                 |   Auto Partition     
                                                 V
Input data is from a table--------------------> lookupstage ------> output stage

Details of the aggragator stage

grouping on two columns(Col_A,Col_B)
and
calculation on all the columns

Aggregation type=Caluculation

Column for Calculation=Col_C
Sum Output Column=Col_C

Column for Calculation=Col_D
Sum Output Column=Col_D
-
-
-
And so on - for 12 columns

Tried possibilities based on the forums.
1. Changed reference auto partition to entire partition, tested the job and same performance.
2. Sorting on grouping columns and tested the job-no improvement.

Additional Details:

Job type: parallel.
Version: 8.1
Configuration: One node configuration.

Any help greatly appreciated.

Thanks
suri

stuartjvnorton · Post by **stuartjvnorton** » Thu Nov 11, 2010 12:18 am

Joins/Lookups or Aggregations with large volumes could be much more efficient back on the database, if you can move some of the work back there (especially as you have 1 node).

swapnilverma · Post by **swapnilverma** » Thu Nov 11, 2010 1:29 am

dataset --> transformer-->aggregator stage
| Auto Partition
V
Input data is from a table--------------------> lookupstage ------> output stage

Can you try loading aggregator op to a table and than join the two tables to get the result ?

if Joining keys are indexed performance will be better ...

ArndW · Post by **ArndW** » Thu Nov 11, 2010 2:57 am

What happens to performance when you put in a sort stage and sort on COL_A and COL_B (hash partition on COL_A)?

karrisuresh · Post by **karrisuresh** » Thu Nov 11, 2010 4:44 am

HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,

thanks

karrisuresh · Post by **karrisuresh** » Thu Nov 11, 2010 4:46 am

HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,

thanks