Page 1 of 1

Running out of memory

Posted: Wed Dec 04, 2019 9:38 am
by Seya
Hi,
I have a job designed as below. While running this job we are running out of DataStage memory issue.
I see that there are many records coming out of transformer stage because of the 40 constraints defined.
DataStage is holding up all the data in memory before processing the aggregator stage.
Can you please share your thoughts on how to resolve this out of memory issue?

Dataset -->(Left join to a Table )-->Transformer Stage( We have about 40 filter conditions) ---> Funnel Stage --> Aggregator Stage --> Modify Stage -->ODBC connector

Thanks in Advance!

Posted: Wed Dec 04, 2019 2:44 pm
by chulett
I'm sure it is the aggregator that is holding everything in memory so that it can sort and group all of your data properly. Only way to solve that is to sort the data before it gets to the Aggregator in a manner that supports the aggregation and then tell it in the Aggregator that the "input is sorted". Then it only needs to hold on to a single "group" at a time.

Posted: Wed Dec 04, 2019 5:38 pm
by ray.wurlod
What Craig said. Specify Sort mode in the Aggregator stage and ensure that your data are sorted by the grouping keys, as well as appropriately partitioned.

Posted: Thu Dec 05, 2019 6:31 am
by Seya
Thanks Craig and Ray for your reply!

I already have the sort method set in aggregator stage and Hash partition defined on the key columns.

just an update on the number of records in to transformer and to aggregator stage
(approx. 2M records ) --> Trasformer--> (64 M records) ---> Aggregator

This there any other way to resolve this issue.

Posted: Thu Dec 05, 2019 7:42 am
by chulett
No, not really. Somehow, either before or during this, you need to sort your data. And this is not so much as 'issue' as much as needing to understand How It Works.

Even if you sort the data before it and then tell the Aggregator to sort it the same way, it will sort it again. I couldn't tell from your reply exactly what you meant and haven't had my hands of DS for years to give you the exact setting but make sure the Aggregator knows your data is already sorted so it skips that step. And trust me, instead of resorting it will now bust you if you get that wrong, i.e. sort it in a manner that does not support the aggregation being done... so get it right. :wink:

Either add a Sort between the Transformer and Aggregator or make sure your input arrives sorted properly by (if possible) when building your source data, dumping it out already sorted.

Posted: Mon Dec 09, 2019 7:43 pm
by ray.wurlod
The Sort method in the Aggregator stage is telling the stage that the data are already sorted. It does NOT sort the data. If the data are not properly sorted (by the grouping keys, in order) then the aggregation will not work.

You can provide that sorting on the input link to the Aggregator stage, or in an immediately upstream Sort stage. If your data are sorted earlier in the job than this, optionally include a Sort stage set to "don't sort, previously sorted".

It should be sufficient (and less overhead) to partition your data only by the first of the grouping keys. [Think about why this is.] Use Modulus if that is an integer, otherwise use Hash. You must have a key-based partitioning algorithm.

Posted: Mon Dec 30, 2019 11:56 am
by UCDI
are you doing something with that aggregator stage that could be hand-rolled in a transformer instead? That might resolve it. You may still want to sort the data.

also, before you go 2M to 64M, is there something you are doing there that is being undone later? Is the part that blows it up to 64M over-doing it and the the agg stage undoing part of that? Maybe the whole process can be collapsed?

Dunno without details, just throwing some stuff to think about around.