Warning message: unbalanced input from partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tcallahan
Premium Member
Premium Member
Posts: 4
Joined: Tue Mar 30, 2004 1:16 pm

Warning message: unbalanced input from partition

Post by tcallahan »

New to Datastage,

I am receiving the following warning message:

APT_ParallelSortMergeOperator(0),2: Unbalanced input from partition 0: 10000 records buffered
APT_ParallelSortMergeOperator(1),0: Unbalanced input from partition 4: 10000 records buffered

The job that I am running is designed as follows:
Dataset ==> Transformer ==> Aggregator ==> Aggregator ==> LookupFile

The job is returning the correct output but I would like to resolve the warning messages. The problem seems to be with the input sort in the first aggregator. The input data is hash by a consumer-id (6 nodes) and the data needs to be sorted by a consumer-id, purchase-date. I have tried using the sort-stage but still received the same messages. Any help or direction to find answers would be appreciated.
Thanks,
Tom :D
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is just a warning.

You can't prevent DataStage warnings if they're needed.

You can only prevent them by removing the cause. In this case, DataStage is alerting you to the fact that it's being asked to process way more rows in one processing node than in another (subtly suggesting that you balance them better). :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tcallahan
Premium Member
Premium Member
Posts: 4
Joined: Tue Mar 30, 2004 1:16 pm

Post by tcallahan »

Thanks for the reply,
I am not sure that I can balance the nodes by using the hash partitioning due to the consumer-id (integer) having any number of transaction records. I did try a test using entire partitioning just to see if the messages would go away but the warnings still appeared. A couple of warnings would not be a problem but a couple of our jobs are receiving over 75 of these warnings. I have narrowed down the problem with using the sort & aggregate stage together. If the sort is left out then the messages dissapear. I will try a different design.
Thanks,
Tom
tcallahan
Premium Member
Premium Member
Posts: 4
Joined: Tue Mar 30, 2004 1:16 pm

Post by tcallahan »

Below is a record count for each node that the aggregator is processing.
node 0: 4392921
node 1: 4409319
node 2: 4383562
node 3: 4392104
node 4: 4386820
node 5: 4406425

The nodes are some what balanced but I wouldn't think that you could get a complete balance using hash or modulus partitioning. Do you know if there is a certain percentage range that the nodes need to be in to be considered balanced? I have talked this over with our admins and they are not sure how to correct this problem. I am just looking for ideas.
Thanks again for all your help.
Tom
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That looks close enough to "balanced" to me. :?
Have you opened a call with your support provider?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tcallahan
Premium Member
Premium Member
Posts: 4
Joined: Tue Mar 30, 2004 1:16 pm

Post by tcallahan »

I have finally corrected the problem with a little help from Ascential.
I have two seperate jobs with the first one just extracting data from a table into a dataset. This dataset was being hash partitioned and sorted. The 2nd job was taking the dataset as input, aggregating the data and then inserting into a lookupfile. In this job I had the agg-stage set up to hash partition & sort instead of just using "same" partitioning since the data had already been hashed in sorted in the extract job. Just over looked this thinking that repartitioning the data would not cause a problem but actually datastage buffered the data to make sure that the sorting was correct and issued the warning messages when the buffer went over 10,000 records.
Thanks,
Tom
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Thanks for posting the solution. It's one of those simple things that we just assume have been done correctly, and don't bother to ask. The support folks are more professional, and always remember to ask! 8)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply