DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
sheema



Group memberships:
Premium Members

Joined: 22 Jul 2006
Posts: 204

Points: 1503

Post Posted: Tue May 12, 2009 2:23 pm Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
Hi,

I have a Job1,where I am using the Surrogate key generator(SGK) stage to generate a unique sequence .I am using the State files to generate a sequence.Below are the steps which i have followed to get this working

1.Created a JobA with just SGK stage to create the state file with no inputs and output links to that stage.(Key Source Action=Create and Source Type=Flat File and Source Name=<path and name of the Stage file)

2.Now I run Job1 with SGK stage which has input links and output links to the stage.(the below options setup in SGK stage File Initial Value=100 and Source Type=Flat File and Source Name=<path and name of stage file> and Generated Output Column Name=<column Name>).

But the problem I am facing is the sequences are not incremented by no 1.Do I have to set up any setting to increment the value by one.
And is there any thing in the process which I am missing for a proper sequence generation.

Thanks
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 43085
Location: Denver, CO
Points: 222463

Post Posted: Tue May 12, 2009 2:32 pm Reply with quote    Back to top    

What makes you think the increment isn't 1? What are you seeing?

_________________
-craig

"You can never have too many knives" -- Logan Nine Fingers
Rate this response:  
Not yet rated
sheema



Group memberships:
Premium Members

Joined: 22 Jul 2006
Posts: 204

Points: 1503

Post Posted: Tue May 12, 2009 2:40 pm Reply with quote    Back to top    

For example if I have 500 rows for which i would like to generate sequence starting from 1 to 500. the SGK stage generates sequences from 1-104 by incrementing with 1 but after 104 it jumps to 1001 and it increments in 1 until 1102 and again jumps to 2001.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 43085
Location: Denver, CO
Points: 222463

Post Posted: Tue May 12, 2009 2:48 pm Reply with quote    Back to top    

How many nodes is your job running on?

_________________
-craig

"You can never have too many knives" -- Logan Nine Fingers
Rate this response:  
Not yet rated
sheema



Group memberships:
Premium Members

Joined: 22 Jul 2006
Posts: 204

Points: 1503

Post Posted: Tue May 12, 2009 2:50 pm Reply with quote    Back to top    

4 nodes.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54582
Location: Sydney, Australia
Points: 295988

Post Posted: Tue May 12, 2009 3:42 pm Reply with quote    Back to top    

Think about what that means.

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 43085
Location: Denver, CO
Points: 222463

Post Posted: Tue May 12, 2009 4:03 pm Reply with quote    Back to top    

A little crazy, I know, but this is explained in the Parallel Job Developer's Guide pdf manual. Wink

_________________
-craig

"You can never have too many knives" -- Logan Nine Fingers
Rate this response:  
Not yet rated
deepticr
Participant



Joined: 19 Mar 2008
Posts: 32
Location: Bangalore, India
Points: 349

Post Posted: Thu May 14, 2009 4:23 am Reply with quote    Back to top    

Hi,

I'm facing a similar issue with SKG stage. I looked at the documentation of Datastage v8 and it explains nothing. I find some explanation in "Parallel Job Developer Guide v7.5.1. It says that the numbers in each partition are incremented by the number of partitions defined. For instatnce if my start value =0 and the number of partitions I have is 2, then the numbers generated are as follows:

Partition 1
-------------
0 2 4 6 8

Partition 2
-------------
1 3 5 7 9

But this is not the way in which the surrogate keys are getting generated.

Here is what I have done:

1. Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file. If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values. However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?

2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition? The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?

Please help me in understanding this.

-Deepti
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 43085
Location: Denver, CO
Points: 222463

Post Posted: Thu May 14, 2009 5:34 am Reply with quote    Back to top    

From what I understand, the stage has 'improvements' in the 8.x version over the 7.x version so perhaps nuances of how the stage works have changed as well. Pity about the lack of documentation, I'll have to check with our 8.x person and see what, if anything, I can find out.

_________________
-craig

"You can never have too many knives" -- Logan Nine Fingers
Rate this response:  
Not yet rated
priyadarshikunal



Group memberships:
Premium Members

Joined: 01 Mar 2007
Posts: 1735
Location: Troy, MI
Points: 9319

Post Posted: Thu May 14, 2009 7:18 am Reply with quote    Back to top    

1.
Quote:
Job A has the SKG stage in Create mode with no i/p or o/p links. I assume this generates the state file.


Yes. An empty surrogate key file. you can just use a touch command in before job subroutine to do the same.

Quote:
If this is case ideally, we must be able to set the initial value and the increment. But there are no properties available to set these values.


Only an empty file is generated with that. Intial value is set by the surrogate key generator stage with o/p link. There you can find out the intial value property.

Quote:
However, the job run successfully with this log message:
"Surrogate_Key_Generator_0,0: State file F:\IBM\InformationServer\Server\Projects\EE2SM_SDN_MSTR_DATA_DEV\Files\WorkingDir\BasicRetail\test_surrkey.txt is empty."


Because the file is empty.

Quote:
Does this mean that this job is only used for marking an already created file as the state file and the sequence number gets generated only after the first invocation of the state file through an SKG stage in another job?


Yes.

Quote:
2. Job B has a SKG stage followed by a Dataset. In the SKG stage I set the following properties:
Number of records =10
Generate Key From Last Highest Value=Y

The number of rows in the dataset is 20. Why is this happening when I have specified the number of records as 10? Does this mean 10 records per partition?


10 records per node. If you want only 10 records restrict its execution by defining node map constraint.

Quote:
The output I get is:
1 2 3 4 5 6 7 8 9 10
1001 1002 1003 ... 1010.
As per the explanation provided by the documentation the key generated (on 2 node configuration) ought to be 1 3 5 7 9 in one partition and 2 4 6 8 10 in another partition. Why is there a gap in the sequence of 1000?


In transformer the default block size is 1 where it will generate 1,3,5,..
and 2,4,6,...

In surrogate key generator stage it will generates like 1,2,3,.....
and 1001,1002,1003,.....

even on one node if the same file is being used twice at the same time result may be like you are getting. Because the first 1000 keys to be generated is reserved by the first instance and next 1000 is reserved by next instance to optimize the process.

Its only my observation and based on the tests performed by me. I never read that in documentaion possiblly its not there.

However you can try to change the default block size by choosing the option file block size to 1.

Test it and let us know. and don't forget to watch the performance. You may find any increase in run time while processing large number of records.

_________________
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. Wink
Rate this response:  
nagarjuna



Group memberships:
Premium Members

Joined: 27 Jun 2008
Posts: 533
Location: Chicago
Points: 2900

Post Posted: Thu May 14, 2009 6:41 pm Reply with quote    Back to top    

Thats a good explanation..i also tested the same and observed that the way it generates depend on the block size .So , if you want it to be sequential then generate with block size=1 or using the option DB sequence .

_________________
Nag
Rate this response:  
Not yet rated
srinivas.nettalam
Participant



Joined: 15 Jun 2010
Posts: 134
Location: Bangalore
Points: 1030

Post Posted: Wed Jul 10, 2013 1:02 am Reply with quote    Back to top    

The default block size is 1000 and you can always set it to 1 if you are particular about the sequence and not just the uniqueness of the values.

_________________
N.Srinivas
India.
Rate this response:  
Not yet rated
jwiles



Group memberships:
Premium Members

Joined: 14 Nov 2004
Posts: 1274

Points: 10406

Post Posted: Wed Jul 10, 2013 10:52 pm Reply with quote    Back to top    

Use the block size of 1 if you require it, otherwise you should consider the default or even larger block sizes if you are assigning a lot of surrogate keys. Lower value block sizes introduce more overhead to the job due to needing to access the statefile more often. I've seen a block size of 1 bring a job to a literal crawl when it was used unnecessarily in a high-volume situation.

Regards,

_________________
- james wiles


All generalizations are false, including this one - Mark Twain.
Rate this response:  
Not yet rated
prasson_ibm



Group memberships:
Premium Members

Joined: 11 Oct 2007
Posts: 536
Location: Bangalore
Points: 4830

Post Posted: Thu Jul 11, 2013 2:03 am Reply with quote    Back to top    

Just i wanted to add my observation,even your block size is 1 and partition is not round robin,you will land up in getting gap in your sequence number.

_________________
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id=61317902&trk=hb_tab_pro_top
Blog:- http://dsshar.blogspot.com/
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours