Page 1 of 1

Partitioning Problem on both 8.0 and 8.1

Posted: Thu Feb 25, 2010 3:17 am
by ArndW
I have a strange problem that I can't really understand and am trying to simplify in order to submit a bug report.

One job uses two sources that each go through a filter stage and from there to a sort stage and then to a join stage; the single stream then goes through a transform stage to 2 output datasets. All of the stages are parallel and have "same" partitioning explicitly set, except the input link to the two sort stages which hash repartitions on a single column.

This code-snippet, at version 8.0.3, when run on a single-node configuration file, produces a dataset with 2 nodes! When run with a 2 (or more) node configuration it works correctly. The 'monitor' shows that in a single node configuration that all stages run in 1 instance except the 2 sort stages and those stages after sorting.

Taking the exact same program to 8.1 it runs correctly with a 1-node configuration; but in a 2-node run the same error as above occurs, except reversed. All stages run in 2 instances except the sorts and stages after that.

Once I start removing and replacing single stages the job starts working correctly, so I think I have it pared down to the bare essentials necessary to reproduce the issue.

Before I submit this (to me very odd problem) I wanted to ask if anyone might have seen something similar before?

Re: Partitioning Problem on both 8.0 and 8.1

Posted: Thu Feb 25, 2010 4:26 am
by ray.wurlod
Do the scores yield any useful information, particularly about partitioners?

Posted: Thu Feb 25, 2010 5:17 am
by ArndW
No, the scores just reflect the text version I posted, showing 2 nodes and 1 node respectively for those stages.

Posted: Thu Feb 25, 2010 3:48 pm
by ray.wurlod
What about the partitioners in the score (the tokens like <>, =>, etc.)?