Surrogate key not generated sequentially

pragb · Post by **pragb** » Fri Jun 22, 2012 7:53 am

Hi,
I am trying to generate surrogate key from transformer stage in my job.

The job structure is as follows:

Database----Transformer------Datasets.

Now I am generating surrogate key in a stage Variable and passing it to all 4 datasets as a key.The job is running with 2 nodes.
I am using a sk file which I have defined in surrogate key tab,and incrementing it with 1 key values with initial value 1.

Now the issue I have is the keys are not generated sequentially.

I tried below steps:
1.Running in sequential mode from transformer onwards and clear partitioning.

In this case I am getting keys as 1,3,5,7,9,.............

2.Also running in parallel mode by setting the partitioning as round robin in input ,but this is generating some random values.

Can you please let me know what else I can try or if I am doing anything wrong?

Many thanks in advance.

nagarjuna · Post by **nagarjuna** » Fri Jun 22, 2012 1:04 pm

In an ideal world , Surrogate key should not have any intelligence embeded in it . It is just a sequence number used as a primary key

pragb · Post by **pragb** » Sat Jun 23, 2012 1:49 am

Hi,
Thanks for the reply.
My only concern is if I am incrementing it by 1 key values it should give me output like 1,2,3,4,.......... but it is giving 1,3,5,7,.........how this is possible?

chulett · Post by **chulett** » Sat Jun 23, 2012 7:29 am

I would assume because of the two nodes and yes I know you've said you're running the transformer sequentially. What happens when you run it on a single node?

pragb · Post by **pragb** » Sat Jun 23, 2012 12:57 pm

Hi,
Many thanks for the reply.
When I am running in single node I am getting the desired output,now my question is inside the transformer in the advanced tab in the node map constraint if I giving node 1 or node 2(when it is running with 2 nodes)
still its generating values like 1,3,5,7,.....
As I dont have the privilege to run with single node,I desperately need a solution for this.
Please help.
Thanks again

chulett · Post by **chulett** » Sat Jun 23, 2012 1:42 pm

There's no "priviledge" required to run on any number of nodes, including a single one, you just need the appropriate $APT_CONFIG_FILE file. And you should have several flavors on hand to work with. So, what's stopping you from taking that solution path?

pragb · Post by **pragb** » Sat Jun 23, 2012 1:48 pm

Thanks Chullet,
I agree,but my concern in that case is if I have huge millions of data and if I run on a single node,that may cause performance issue,otherwise I am ok with running on single node

chulett · Post by **chulett** » Sat Jun 23, 2012 2:03 pm

Well, that's an "if" bridge you can cross when you get there. Do some benchmarks of the single node speed and see if the performance is anything to worry about. Since you are just basically dumping a database to disk, it may not be an issue. If you need to, up the number of nodes and live with the results. Gaps in surrogates keys are not typically something anyone really worries about.

However, the surrogate values may be getting generated "sequentially", it could just be you don't see them all targeting the same dataset. You said you are writing to four datasets, how are you partition amongst them? Any chance the even numbers are in one of the other datasets? If not then let us know the gory details of how you are getting data from the transformer to the four targets.