Generate Sequential Number using Parallel Transformer
Moderators: chulett, rschirm, roy
Generate Sequential Number using Parallel Transformer
Hi Guys
I want to generate sequential numbers in a parallel job with configuration file using 4 nodes. I have done the following posted by vmcburney in the transformer stage.
1) Created a stage variable for the counter named SVCounter.
with the value as "@PARTITIONNUM - @NUMPARTITIONS + 1".
2) Set the derivation of the stage variable named SVRow to "SVCounter + @NUMPARTITIONS".
Each instance will start at a different number, eg. -1, -2, -3, -4. When the counter is incremented each instance is increment by the number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9, 13... Instance 2 is 2, 6, 10, 14... etc
Can anyone suggest me the syntax of how to write the IF statement using the SVRow and SVCounter values. I want to generate sequential numbers no matter on which node the data has been split for processing.
I want to generate sequential numbers in a parallel job with configuration file using 4 nodes. I have done the following posted by vmcburney in the transformer stage.
1) Created a stage variable for the counter named SVCounter.
with the value as "@PARTITIONNUM - @NUMPARTITIONS + 1".
2) Set the derivation of the stage variable named SVRow to "SVCounter + @NUMPARTITIONS".
Each instance will start at a different number, eg. -1, -2, -3, -4. When the counter is incremented each instance is increment by the number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9, 13... Instance 2 is 2, 6, 10, 14... etc
Can anyone suggest me the syntax of how to write the IF statement using the SVRow and SVCounter values. I want to generate sequential numbers no matter on which node the data has been split for processing.
Regards
Kashif Khan
Kashif Khan
I would'nt want to disturb the current design, its better I do it in Transformer and most importantly I want to learn how to do this in a transformer. Running the transformer in sequential is giving me a drawback in performance. I am trying to process around 140 Million records in the job.
Regards
Kashif Khan
Kashif Khan
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You should not need an IF expression. Initialize the stage variable to the partition number, and increment by the number of partitions. This sequence is guarantee to be unique across all partitions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The problem is that I don't get my sequence unique across all partitions. I did the following:
Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1
Column (SVRow) = SVCount + @NUMPARTITIONS
So when I see SVRow in the output file, I get SVRow = 1 for the first 24287 records and then SVRow = 2 for the next 24287 records and so on for four nodes and the sequence again repeats itself. What I need are numbers as 1,2,3,4.......................n for each n rows produced in the outfile. Please correct me if I following a wrong notion.
Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1
Column (SVRow) = SVCount + @NUMPARTITIONS
So when I see SVRow in the output file, I get SVRow = 1 for the first 24287 records and then SVRow = 2 for the next 24287 records and so on for four nodes and the sequence again repeats itself. What I need are numbers as 1,2,3,4.......................n for each n rows produced in the outfile. Please correct me if I following a wrong notion.
Regards
Kashif Khan
Kashif Khan
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You aren't incrementing either sv since PARTITIONNUM and NUMPARTITIONS are static values.kashif007 wrote: Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1
Column (SVRow) = SVCount + @NUMPARTITIONS
I would initialise SVRow to
@PARTITIONNUM - @NUMPARTITIONS + 1
then at each row do
SVRow = SVRow + @NUMPARTITIONS
Sorry Guys, Actually I ran into problems again. I was trying to run the job with different number of input records/dataset. The logic works well with limited data say around 900,000 input records but when I run the job using 1 million records or more, the sequence generation is broken and only one number "1000000" previals for all the remaining records after 1000000th record. Can you anyone please help me identify what went wrong in the logic.
Just to remind you I am using a 4 node config file and have done the following in the transformer.
Initialized the Stage Variable,
SVCount = @PARTITIONNUM - @NUMPARTITIONS + 1
Derived SVCount = SVCount + @NUMPARTITIONS
Created Output column called SeqNum = SVRow
Thanks
Just to remind you I am using a 4 node config file and have done the following in the transformer.
Initialized the Stage Variable,
SVCount = @PARTITIONNUM - @NUMPARTITIONS + 1
Derived SVCount = SVCount + @NUMPARTITIONS
Created Output column called SeqNum = SVRow
Thanks
Regards
Kashif Khan
Kashif Khan