aggregator performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

aggregator performance

Post by dnat »

I am using an aggregator stage just to count the number of rows from a particular link.

The design is like this

Seq file-->transformer-->aggregator-->seq file

Here i need the aggregator to count the total rows from transformer(the key is same for all the records), so it would pass through only one partition.

I am dealing with millions of records. Now we are doing development, but wanted to know how this would affect the performance. Or is there any other way to do this?
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

Are you getting just count of the record in the output Sequential file.
If yes then you can use wc -l < inputfilename > outputfilename to get the count.
Birendra
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you really need the count as a separate operation? Why not calculate it as you are processing the actual file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sjaladurgam
Participant
Posts: 2
Joined: Thu Jul 24, 2008 9:43 am

Re: aggregator performance

Post by sjaladurgam »

Even I experienced same issue.But I tried keeping 2 Agg Stages and making first one with hash partitioning and second one with sequential that works fantastic.

Just try this.

Thanks.
Skumar
sima79
Premium Member
Premium Member
Posts: 38
Joined: Mon Jul 16, 2007 8:12 am
Location: Melbourne, Australia

Post by sima79 »

One aggregator stage (execution mode parallel) to count the rows in parallel then another aggregator stage (execution mode sequential) to sum up the counts from each partition. No need to use hash partitioning, round robin in this case would be better.
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

sima and sjaladurgam

So, the two aggregator stages would not hinder the performance while doing for millions of records???. i am just worried since the data is very huge..anyway, thanks for your input.

Ray, i am not sure how we can calculate while actual processing, because anyways i have to calculate withouth the partitioning to get the total count.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Re: aggregator performance

Post by Sainath.Srinivasan »

sjaladurgam wrote:...and second one with sequential ...
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

i made the first aggregaor as round robin and next as sequential mode. But the output is not correct.

The first aggregator shows as a collection type.
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

The first aggregator was showing as collection type because it was in sequential mode. I made it to parallel and partitioned in round robin. The second aggregator is in sequential mode. But it is not giving correct output.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What do you mean by "not giving correct data"?

Unless you share the results, it is not even possible to guess what is happening differently.
Post Reply