state files in surrogate key generator

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sheema
Premium Member
Premium Member
Posts: 204
Joined: Sat Jul 22, 2006 10:20 am

state files in surrogate key generator

Post by sheema »

Hi,

I have a Job1,where I am using the Surrogate key generator(SGK) stage to generate a unique sequence .I am using the State files to generate a sequence.Below are the steps which i have followed to get this working

1.Created a JobA with just SGK stage to create the state file with no inputs and output links to that stage.(Key Source Action=Create and Source Type=Flat File and Source Name=<path and name of the Stage file)

2.Now I run Job1 with SGK stage which has input links and output links to the stage.(the below options setup in SGK stage File Initial Value=100 and Source Type=Flat File and Source Name=<path and name of stage file> and Generated Output Column Name=<column Name>).

But the problem I am facing is the sequences are not incremented by no 1.Do I have to set up any setting to increment the value by one.
And is there any thing in the process which I am missing for a proper sequence generation.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What makes you think the increment isn't 1? What are you seeing?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sheema
Premium Member
Premium Member
Posts: 204
Joined: Sat Jul 22, 2006 10:20 am

Post by sheema »

For example if I have 500 rows for which i would like to generate sequence starting from 1 to 500. the SGK stage generates sequences from 1-104 by incrementing with 1 but after 104 it jumps to 1001 and it increments in 1 until 1102 and again jumps to 2001.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How many nodes is your job running on?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sheema
Premium Member
Premium Member
Posts: 204
Joined: Sat Jul 22, 2006 10:20 am

Post by sheema »

4 nodes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Think about what that means.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A little crazy, I know, but this is explained in the Parallel Job Developer's Guide pdf manual. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
deepticr
Participant
Posts: 32
Joined: Wed Mar 19, 2008 7:01 am
Location: Bangalore, India

Facing a similar issue with surrogate key generator stage

Post by deepticr »

Hi,

I'm facing a similar issue with SKG stage. I looked at the documentation of Datastage v8 and it explains nothing. I find some explanation in "Parallel Job Developer Guide v7.5.1. It says that the numbers in each partition are incremented by the number of partitions defined. For instatnce if my start value =0 and the number of partitions I have is 2, then the numbers generated are as follows:

Partition 1
-------------
0 2 4 6 8

Partition 2
-------------
1 3 5 7 9

But this is not the way in which the surrogate keys are getting generated.

Here is what I have done:

1. Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file. If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values. However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?

2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition? The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?

Please help me in understanding this.

-Deepti
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

From what I understand, the stage has 'improvements' in the 8.x version over the 7.x version so perhaps nuances of how the stage works have changed as well. Pity about the lack of documentation, I'll have to check with our 8.x person and see what, if anything, I can find out.
-craig

"You can never have too many knives" -- Logan Nine Fingers
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

1.
Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file.


Yes. An empty surrogate key file. you can just use a touch command in before job subroutine to do the same.
If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values.
Only an empty file is generated with that. Intial value is set by the surrogate key generator stage with o/p link. There you can find out the intial value property.
However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Because the file is empty.
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?
Yes.
2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition?
10 records per node. If you want only 10 records restrict its execution by defining node map constraint.
The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?
In transformer the default block size is 1 where it will generate 1,3,5,..
and 2,4,6,...

In surrogate key generator stage it will generates like 1,2,3,.....
and 1001,1002,1003,.....

even on one node if the same file is being used twice at the same time result may be like you are getting. Because the first 1000 keys to be generated is reserved by the first instance and next 1000 is reserved by next instance to optimize the process.

Its only my observation and based on the tests performed by me. I never read that in documentaion possiblly its not there.

However you can try to change the default block size by choosing the option file block size to 1.

Test it and let us know. and don't forget to watch the performance. You may find any increase in run time while processing large number of records.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Thats a good explanation..i also tested the same and observed that the way it generates depend on the block size .So , if you want it to be sequential then generate with block size=1 or using the option DB sequence .
Nag
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Post by srinivas.nettalam »

The default block size is 1000 and you can always set it to 1 if you are particular about the sequence and not just the uniqueness of the values.
N.Srinivas
India.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Use the block size of 1 if you require it, otherwise you should consider the default or even larger block sizes if you are assigning a lot of surrogate keys. Lower value block sizes introduce more overhead to the job due to needing to access the statefile more often. I've seen a block size of 1 bring a job to a literal crawl when it was used unnecessarily in a high-volume situation.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Just i wanted to add my observation,even your block size is 1 and partition is not round robin,you will land up in getting gap in your sequence number.
Post Reply