Warning message: unbalanced input from partition

tcallahan · Post by **tcallahan** » Thu Oct 21, 2004 8:41 am

New to Datastage,

I am receiving the following warning message:

APT_ParallelSortMergeOperator(0),2: Unbalanced input from partition 0: 10000 records buffered
APT_ParallelSortMergeOperator(1),0: Unbalanced input from partition 4: 10000 records buffered

The job that I am running is designed as follows:
Dataset ==> Transformer ==> Aggregator ==> Aggregator ==> LookupFile

The job is returning the correct output but I would like to resolve the warning messages. The problem seems to be with the input sort in the first aggregator. The input data is hash by a consumer-id (6 nodes) and the data needs to be sorted by a consumer-id, purchase-date. I have tried using the sort-stage but still received the same messages. Any help or direction to find answers would be appreciated.
Thanks,
Tom :D

ray.wurlod · Post by **ray.wurlod** » Thu Oct 21, 2004 3:40 pm

It is just a warning.

You can't prevent DataStage warnings if they're needed.

You can only prevent them by removing the cause. In this case, DataStage is alerting you to the fact that it's being asked to process way more rows in one processing node than in another (subtly suggesting that you balance them better).

tcallahan · Post by **tcallahan** » Thu Oct 21, 2004 8:30 pm

Thanks for the reply,
I am not sure that I can balance the nodes by using the hash partitioning due to the consumer-id (integer) having any number of transaction records. I did try a test using entire partitioning just to see if the messages would go away but the warnings still appeared. A couple of warnings would not be a problem but a couple of our jobs are receiving over 75 of these warnings. I have narrowed down the problem with using the sort & aggregate stage together. If the sort is left out then the messages dissapear. I will try a different design.
Thanks,
Tom

tcallahan · Post by **tcallahan** » Thu Oct 21, 2004 8:41 pm

Below is a record count for each node that the aggregator is processing.
node 0: 4392921
node 1: 4409319
node 2: 4383562
node 3: 4392104
node 4: 4386820
node 5: 4406425

The nodes are some what balanced but I wouldn't think that you could get a complete balance using hash or modulus partitioning. Do you know if there is a certain percentage range that the nodes need to be in to be considered balanced? I have talked this over with our admins and they are not sure how to correct this problem. I am just looking for ideas.
Thanks again for all your help.
Tom

ray.wurlod · Post by **ray.wurlod** » Thu Oct 21, 2004 11:13 pm

That looks close enough to "balanced" to me.

Have you opened a call with your support provider?

tcallahan · Post by **tcallahan** » Fri Oct 22, 2004 2:57 pm

I have finally corrected the problem with a little help from Ascential.
I have two seperate jobs with the first one just extracting data from a table into a dataset. This dataset was being hash partitioned and sorted. The 2nd job was taking the dataset as input, aggregating the data and then inserting into a lookupfile. In this job I had the agg-stage set up to hash partition & sort instead of just using "same" partitioning since the data had already been hashed in sorted in the extract job. Just over looked this thinking that repartitioning the data would not cause a problem but actually datastage buffered the data to make sure that the sorting was correct and issued the warning messages when the buffer went over 10,000 records.
Thanks,
Tom

ray.wurlod · Post by **ray.wurlod** » Fri Oct 22, 2004 4:25 pm

Thanks for posting the solution. It's one of those simple things that we just assume have been done correctly, and don't bother to ask. The support folks are more professional, and always remember to ask!