Understanding Row Numbering algorithm

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Understanding Row Numbering algorithm

Post by abc123 »

I am posting in this thread because it totally relates to the original poster's topic.

I am using auto partitioning. The logic I have is:

InitialValue for stage variable svOne:
@PARTITIONNUM-(@NUMPARTITIONS-1)

svOne derivation: svOne + @NUMPARTITIONS
svOutputRow=svOne

This works perfectly every time no matter how many nodes I have in the config file. I tried with files of over 10000 rows and it works fine as well.

My problem is, I am not sure I understand the logic. I tried to use peek stage to get system variable values out. For example, for svOutputRow of 14, I have these values: NUMPARTITIONS=4, PARTITIONNUM=1, InitialValue of svOne: -2
The final value should be: -2 + 4 = 2 unless InitialValue is being calculated only once per node and only "svOne + @NUMPARTITIONS" is being executed in which case the values would come out right.

Any thoughts?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ha! It "totally relates" to a ton of other posts on this subject, so off you go to one of your own! Now you are master of your own fate. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Each stage variable's initial value is set only once per node, before any derivations take place.
Choose a job you love, and you will never have to work a day in your life. - Confucius
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

This works perfectly but according to Ray's post in the linked topic it should not. Here is Ray's excerpt:

"Unless you can guarantee absolutely even distribution you will always see holes in the sequence. The only way that you can guarantee absolutely even distribution is (a) to specify Round Robin as the partitioning algorithm and (b) to have a number of rows that is an exact multiple of the number of partitions."

I am not doing either: (a) I am doing auto partitioning (b) My number of rows is not an exact multiple of number of partitions.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Are you saying it is working perfectly (congratulations) and you are perhaps trying to break it and wondering why it won't break?

"Auto" is not its own separate type of partitioning; it will automatically choose a type of partitioning. Maybe it chose round robin in your job. Have you checked to see what it automatically chose?

If you ran the job on 10000 rows and 4 nodes, that is a multiple of 4. If you ran the job on 9999 rows or 9997 rows, there is still a chance the sequence would be in good order with no gaps.
Choose a job you love, and you will never have to work a day in your life. - Confucius
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

I know that Auto can pick Round Robin but it is working every single time for years. I have had plenty of odd number of rows situations and it works all the time.

I am not trying to break it. I am just trying to understand why it is working when the number of rows is not an exact multiple of number of partitions.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

Could it be that you are running on one node, or the transformer is set to run sequentially?
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
Post Reply