Running out of memory

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Seya
Participant
Posts: 27
Joined: Thu Mar 29, 2007 3:27 am

Running out of memory

Post by Seya »

Hi,
I have a job designed as below. While running this job we are running out of DataStage memory issue.
I see that there are many records coming out of transformer stage because of the 40 constraints defined.
DataStage is holding up all the data in memory before processing the aggregator stage.
Can you please share your thoughts on how to resolve this out of memory issue?

Dataset -->(Left join to a Table )-->Transformer Stage( We have about 40 filter conditions) ---> Funnel Stage --> Aggregator Stage --> Modify Stage -->ODBC connector

Thanks in Advance!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm sure it is the aggregator that is holding everything in memory so that it can sort and group all of your data properly. Only way to solve that is to sort the data before it gets to the Aggregator in a manner that supports the aggregation and then tell it in the Aggregator that the "input is sorted". Then it only needs to hold on to a single "group" at a time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What Craig said. Specify Sort mode in the Aggregator stage and ensure that your data are sorted by the grouping keys, as well as appropriately partitioned.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Seya
Participant
Posts: 27
Joined: Thu Mar 29, 2007 3:27 am

Post by Seya »

Thanks Craig and Ray for your reply!

I already have the sort method set in aggregator stage and Hash partition defined on the key columns.

just an update on the number of records in to transformer and to aggregator stage
(approx. 2M records ) --> Trasformer--> (64 M records) ---> Aggregator

This there any other way to resolve this issue.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, not really. Somehow, either before or during this, you need to sort your data. And this is not so much as 'issue' as much as needing to understand How It Works.

Even if you sort the data before it and then tell the Aggregator to sort it the same way, it will sort it again. I couldn't tell from your reply exactly what you meant and haven't had my hands of DS for years to give you the exact setting but make sure the Aggregator knows your data is already sorted so it skips that step. And trust me, instead of resorting it will now bust you if you get that wrong, i.e. sort it in a manner that does not support the aggregation being done... so get it right. :wink:

Either add a Sort between the Transformer and Aggregator or make sure your input arrives sorted properly by (if possible) when building your source data, dumping it out already sorted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Sort method in the Aggregator stage is telling the stage that the data are already sorted. It does NOT sort the data. If the data are not properly sorted (by the grouping keys, in order) then the aggregation will not work.

You can provide that sorting on the input link to the Aggregator stage, or in an immediately upstream Sort stage. If your data are sorted earlier in the job than this, optionally include a Sort stage set to "don't sort, previously sorted".

It should be sufficient (and less overhead) to partition your data only by the first of the grouping keys. [Think about why this is.] Use Modulus if that is an integer, otherwise use Hash. You must have a key-based partitioning algorithm.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

are you doing something with that aggregator stage that could be hand-rolled in a transformer instead? That might resolve it. You may still want to sort the data.

also, before you go 2M to 64M, is there something you are doing there that is being undone later? Is the part that blows it up to 64M over-doing it and the the agg stage undoing part of that? Maybe the whole process can be collapsed?

Dunno without details, just throwing some stuff to think about around.
Post Reply