Generate Sequential Number using Parallel Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Generate Sequential Number using Parallel Transformer

Post by kashif007 »

Hi Guys

I want to generate sequential numbers in a parallel job with configuration file using 4 nodes. I have done the following posted by vmcburney in the transformer stage.

1) Created a stage variable for the counter named SVCounter.
with the value as "@PARTITIONNUM - @NUMPARTITIONS + 1".
2) Set the derivation of the stage variable named SVRow to "SVCounter + @NUMPARTITIONS".

Each instance will start at a different number, eg. -1, -2, -3, -4. When the counter is incremented each instance is increment by the number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9, 13... Instance 2 is 2, 6, 10, 14... etc

Can anyone suggest me the syntax of how to write the IF statement using the SVRow and SVCounter values. I want to generate sequential numbers no matter on which node the data has been split for processing.
Regards
Kashif Khan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Curious why you wouldn't use the Surrogate Key stage for something like this, seeing as how it does all that automagically? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

I would'nt want to disturb the current design, its better I do it in Transformer and most importantly I want to learn how to do this in a transformer. Running the transformer in sequential is giving me a drawback in performance. I am trying to process around 140 Million records in the job.
Regards
Kashif Khan
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Learn to use the surrogate key generator stage. No disturbance of the current design required... it's just a NextSurrogateKey function call in your transformer... and fewer keystrokes than creating those stage variables. :wink:

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You should not need an IF expression. Initialize the stage variable to the partition number, and increment by the number of partitions. This sequence is guarantee to be unique across all partitions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

The problem is that I don't get my sequence unique across all partitions. I did the following:

Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1

Column (SVRow) = SVCount + @NUMPARTITIONS

So when I see SVRow in the output file, I get SVRow = 1 for the first 24287 records and then SVRow = 2 for the next 24287 records and so on for four nodes and the sequence again repeats itself. What I need are numbers as 1,2,3,4.......................n for each n rows produced in the outfile. Please correct me if I following a wrong notion.
Regards
Kashif Khan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Initialize svCount to @PARTITIONNUM

Derive svCount as svCount + @NUMPARTITIONS

Derive column (or svRow) as svCount
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

kashif007 wrote: Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1

Column (SVRow) = SVCount + @NUMPARTITIONS
You aren't incrementing either sv since PARTITIONNUM and NUMPARTITIONS are static values.

I would initialise SVRow to

@PARTITIONNUM - @NUMPARTITIONS + 1

then at each row do

SVRow = SVRow + @NUMPARTITIONS
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

Thanks a bunch kryt0n and Ray, it actually works. I did a fumble in the initialization part of the stage variable. My bad. :x
Regards
Kashif Khan
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

Sorry Guys, Actually I ran into problems again. I was trying to run the job with different number of input records/dataset. The logic works well with limited data say around 900,000 input records but when I run the job using 1 million records or more, the sequence generation is broken and only one number "1000000" previals for all the remaining records after 1000000th record. Can you anyone please help me identify what went wrong in the logic.

Just to remind you I am using a 4 node config file and have done the following in the transformer.

Initialized the Stage Variable,
SVCount = @PARTITIONNUM - @NUMPARTITIONS + 1

Derived SVCount = SVCount + @NUMPARTITIONS

Created Output column called SeqNum = SVRow

Thanks
Regards
Kashif Khan
Post Reply