Understanding Row Numbering algorithm

abc123 · Post by **abc123** » Fri Dec 05, 2014 7:49 pm

I am posting in this thread because it totally relates to the original poster's topic.

I am using auto partitioning. The logic I have is:

InitialValue for stage variable svOne:
@PARTITIONNUM-(@NUMPARTITIONS-1)

svOne derivation: svOne + @NUMPARTITIONS
svOutputRow=svOne

This works perfectly every time no matter how many nodes I have in the config file. I tried with files of over 10000 rows and it works fine as well.

My problem is, I am not sure I understand the logic. I tried to use peek stage to get system variable values out. For example, for svOutputRow of 14, I have these values: NUMPARTITIONS=4, PARTITIONNUM=1, InitialValue of svOne: -2
The final value should be: -2 + 4 = 2 unless InitialValue is being calculated only once per node and only "svOne + @NUMPARTITIONS" is being executed in which case the values would come out right.

Any thoughts?

chulett · Post by **chulett** » Fri Dec 05, 2014 8:18 pm

Ha! It "totally relates" to a ton of other posts on this subject, so off you go to one of your own! Now you are master of your own fate.

qt_ky · Post by **qt_ky** » Fri Dec 05, 2014 9:30 pm

Each stage variable's initial value is set only once per node, before any derivations take place.

abc123 · Post by **abc123** » Sat Dec 06, 2014 4:45 am

This works perfectly but according to Ray's post in the linked topic it should not. Here is Ray's excerpt:

"Unless you can guarantee absolutely even distribution you will always see holes in the sequence. The only way that you can guarantee absolutely even distribution is (a) to specify Round Robin as the partitioning algorithm and (b) to have a number of rows that is an exact multiple of the number of partitions."

I am not doing either: (a) I am doing auto partitioning (b) My number of rows is not an exact multiple of number of partitions.

qt_ky · Post by **qt_ky** » Sat Dec 06, 2014 6:57 am

Are you saying it is working perfectly (congratulations) and you are perhaps trying to break it and wondering why it won't break?

"Auto" is not its own separate type of partitioning; it will automatically choose a type of partitioning. Maybe it chose round robin in your job. Have you checked to see what it automatically chose?

If you ran the job on 10000 rows and 4 nodes, that is a multiple of 4. If you ran the job on 9999 rows or 9997 rows, there is still a chance the sequence would be in good order with no gaps.

abc123 · Post by **abc123** » Sat Dec 06, 2014 1:05 pm

I know that Auto can pick Round Robin but it is working every single time for years. I have had plenty of odd number of rows situations and it works all the time.

I am not trying to break it. I am just trying to understand why it is working when the number of rows is not an exact multiple of number of partitions.

battaliou · Post by **battaliou** » Wed Jun 24, 2015 2:59 pm

Could it be that you are running on one node, or the transformer is set to run sequentially?