Surrogate key not generated sequentially

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pragb
Participant
Posts: 10
Joined: Sat Jul 16, 2011 5:18 am
Location: Pune
Contact:

Surrogate key not generated sequentially

Post by pragb »

Hi,
I am trying to generate surrogate key from transformer stage in my job.

The job structure is as follows:

Database----Transformer------Datasets.

Now I am generating surrogate key in a stage Variable and passing it to all 4 datasets as a key.The job is running with 2 nodes.
I am using a sk file which I have defined in surrogate key tab,and incrementing it with 1 key values with initial value 1.

Now the issue I have is the keys are not generated sequentially.

I tried below steps:
1.Running in sequential mode from transformer onwards and clear partitioning.

In this case I am getting keys as 1,3,5,7,9,.............

2.Also running in parallel mode by setting the partitioning as round robin in input ,but this is generating some random values.

Can you please let me know what else I can try or if I am doing anything wrong?

Many thanks in advance.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

In an ideal world , Surrogate key should not have any intelligence embeded in it . It is just a sequence number used as a primary key
Nag
pragb
Participant
Posts: 10
Joined: Sat Jul 16, 2011 5:18 am
Location: Pune
Contact:

Post by pragb »

Hi,
Thanks for the reply.
My only concern is if I am incrementing it by 1 key values it should give me output like 1,2,3,4,.......... but it is giving 1,3,5,7,.........how this is possible?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would assume because of the two nodes and yes I know you've said you're running the transformer sequentially. What happens when you run it on a single node?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pragb
Participant
Posts: 10
Joined: Sat Jul 16, 2011 5:18 am
Location: Pune
Contact:

Post by pragb »

Hi,
Many thanks for the reply.
When I am running in single node I am getting the desired output,now my question is inside the transformer in the advanced tab in the node map constraint if I giving node 1 or node 2(when it is running with 2 nodes)
still its generating values like 1,3,5,7,.....
As I dont have the privilege to run with single node,I desperately need a solution for this.
Please help.
Thanks again
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no "priviledge" required to run on any number of nodes, including a single one, you just need the appropriate $APT_CONFIG_FILE file. And you should have several flavors on hand to work with. So, what's stopping you from taking that solution path?
-craig

"You can never have too many knives" -- Logan Nine Fingers
pragb
Participant
Posts: 10
Joined: Sat Jul 16, 2011 5:18 am
Location: Pune
Contact:

Post by pragb »

Thanks Chullet,
I agree,but my concern in that case is if I have huge millions of data and if I run on a single node,that may cause performance issue,otherwise I am ok with running on single node
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, that's an "if" bridge you can cross when you get there. Do some benchmarks of the single node speed and see if the performance is anything to worry about. Since you are just basically dumping a database to disk, it may not be an issue. If you need to, up the number of nodes and live with the results. Gaps in surrogates keys are not typically something anyone really worries about.

However, the surrogate values may be getting generated "sequentially", it could just be you don't see them all targeting the same dataset. You said you are writing to four datasets, how are you partition amongst them? Any chance the even numbers are in one of the other datasets? If not then let us know the gory details of how you are getting data from the transformer to the four targets.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply