Partitioning Problem on both 8.0 and 8.1
Posted: Thu Feb 25, 2010 3:17 am
I have a strange problem that I can't really understand and am trying to simplify in order to submit a bug report.
One job uses two sources that each go through a filter stage and from there to a sort stage and then to a join stage; the single stream then goes through a transform stage to 2 output datasets. All of the stages are parallel and have "same" partitioning explicitly set, except the input link to the two sort stages which hash repartitions on a single column.
This code-snippet, at version 8.0.3, when run on a single-node configuration file, produces a dataset with 2 nodes! When run with a 2 (or more) node configuration it works correctly. The 'monitor' shows that in a single node configuration that all stages run in 1 instance except the 2 sort stages and those stages after sorting.
Taking the exact same program to 8.1 it runs correctly with a 1-node configuration; but in a 2-node run the same error as above occurs, except reversed. All stages run in 2 instances except the sorts and stages after that.
Once I start removing and replacing single stages the job starts working correctly, so I think I have it pared down to the bare essentials necessary to reproduce the issue.
Before I submit this (to me very odd problem) I wanted to ask if anyone might have seen something similar before?
One job uses two sources that each go through a filter stage and from there to a sort stage and then to a join stage; the single stream then goes through a transform stage to 2 output datasets. All of the stages are parallel and have "same" partitioning explicitly set, except the input link to the two sort stages which hash repartitions on a single column.
This code-snippet, at version 8.0.3, when run on a single-node configuration file, produces a dataset with 2 nodes! When run with a 2 (or more) node configuration it works correctly. The 'monitor' shows that in a single node configuration that all stages run in 1 instance except the 2 sort stages and those stages after sorting.
Taking the exact same program to 8.1 it runs correctly with a 1-node configuration; but in a 2-node run the same error as above occurs, except reversed. All stages run in 2 instances except the sorts and stages after that.
Once I start removing and replacing single stages the job starts working correctly, so I think I have it pared down to the bare essentials necessary to reproduce the issue.
Before I submit this (to me very odd problem) I wanted to ask if anyone might have seen something similar before?