aggregator stage, performance is getting effect

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh_dsx
Participant
Posts: 160
Joined: Tue May 02, 2006 7:49 am

aggregator stage, performance is getting effect

Post by suresh_dsx »

Hi,
Hoping that using aggregator stage, performance is getting effect in the job. The job is running for 1hour for 1 million records. Near future we will get 5-10 million records.

Code: Select all

dataset --> transformer-->aggregator stage
                                                 |   Auto Partition     
                                                 V
Input data is from a table--------------------> lookupstage ------> output stage
Details of the aggragator stage

grouping on two columns(Col_A,Col_B)
and
calculation on all the columns

Aggregation type=Caluculation

Column for Calculation=Col_C
Sum Output Column=Col_C

Column for Calculation=Col_D
Sum Output Column=Col_D
-
-
-
And so on - for 12 columns


Tried possibilities based on the forums.
1. Changed reference auto partition to entire partition, tested the job and same performance.
2. Sorting on grouping columns and tested the job-no improvement.


Additional Details:

Job type: parallel.
Version: 8.1
Configuration: One node configuration.


Any help greatly appreciated.
Thanks
suri
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Joins/Lookups or Aggregations with large volumes could be much more efficient back on the database, if you can move some of the work back there (especially as you have 1 node).
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

dataset --> transformer-->aggregator stage
| Auto Partition
V
Input data is from a table--------------------> lookupstage ------> output stage


Can you try loading aggregator op to a table and than join the two tables to get the result ?

if Joining keys are indexed performance will be better ...
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What happens to performance when you put in a sort stage and sort on COL_A and COL_B (hash partition on COL_A)?
karrisuresh
Participant
Posts: 57
Joined: Sat Jun 09, 2007 1:14 am
Location: chicago

Post by karrisuresh »

HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,

thanks
Hi I have experience in parallel extender datastage I am ready to give/take help from other
hope we all help each other hand in hand
karrisuresh
Participant
Posts: 57
Joined: Sat Jun 09, 2007 1:14 am
Location: chicago

Post by karrisuresh »

HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,

thanks
Hi I have experience in parallel extender datastage I am ready to give/take help from other
hope we all help each other hand in hand
Post Reply