Surrogate Key generator

sriram_h · Post by **sriram_h** » Mon Apr 11, 2005 11:40 pm

I would like to Know the various functionalities of Surrogate key generator and when should i choose surrogate Key for an application.

ArndW · Post by **ArndW** » Tue Apr 12, 2005 4:04 am

Hello sriram,

the question of surrogate keys is not very specific to DataStage, it lies rather close to the core of Data Warehousing. Every book on DWH will contain either a chapter or at least some reference to surrogate keys, and a quick look at Google threw out this page with a short descrption.

The Px's implementation of surrogate key generation is simple and straightforward, so once you get a feel for where and how to use such an entity you can easily use DS to generate them for you.

bcarlson · Post by **bcarlson** » Wed Apr 13, 2005 12:20 pm

Basically the Surrogate Key generates a unique number to each record using a counter. By default the counter starts at 0, but you can specify another number if necessary. For example, if you were generating keys to a table that gets appended daily, you may want to get the max id in the table, and use max id + 1 as the starting value in your surrogate key stage.

We have used surrogate keys in situations where multiple data keys are required to identify a unique record, like customer-to-account relationships (keys = customer, account, relationship-code, effective_date, etc). It is easier and more efficient to join on a single key than many.

In other situations, we are receiving data from a source that may eventually reuse its id's. Say we get account data and the source provides a cust_acct_id as a unique identifier. We use cust_acct_id as our primary key and then the source decides that they are going to start reusing cust_acct_id's that have been purged from their system. That works fine for them, they are an operational system and only keep current data. We, however, are a data warehouse and store data historically. If we keep cust_acct_id as our only key, then we have to track the source's reuse of each id. On the other hand, if we use a generated key and carry cust_acct_id simply as another data field, then we have removed the dependency on the source's cust_acct_id and their reuse.

Hope this helps.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 13, 2005 2:53 pm

Small pedantic point. Surrogate keys don't need to be a sequence of numbers, they don't even have to be numbers; all they have to be is unique within the table.

It happens that generating a sequence of integers is the easiest way to achieve this, and that integers are maximally efficient for joins. However, if you generate a surrogate key value then end up not using it (maybe because the row is rejected), there's nothing wrong with the valid data in the table. Even though there's a gap in the sequence you still have uniqueness.

bcarlson · Post by **bcarlson** » Thu Apr 14, 2005 1:51 pm

ray.wurlod wrote:Small pedantic point. Surrogate keys don't need to be a sequence of numbers, they don't even have to be numbers; all they have to be is unique within the table.

You're right, but for some reason the Surrogate Key stage in DS (in the Processing group) only uses 16,32, or 64 bit integers. If you want to generate something other than integers, you can use the Generator Stage (in the Development/Debug group).

Adam_Clone · Post by **Adam_Clone** » Fri Apr 15, 2005 3:10 am

hi
Surrogate keys come into use usually in 'Type 2' category of jobs. These type jobs are those used to handle historcal data. For eg, say someone's address changes, and we need to keep the old address also in the data warehouse. But now the problem is that this row with the old data cannot be kept in the warehouse with the same key, and moreover the key itself should be the same for the row with the old address and the new one, which are effectively the same person. So what we do is we keep the row data intact with the same key for both the rows (Same row with new address and old address) and add one more column to the table in the warehouse which acts as the new key for the table. The surrogate key generator object is used to generate this key called the surrogate key. Remember to unclick the 'Key' option for the original key column in the output link from surrogate key generator stage.
I hope this has cleared your doubt.

DSXchange

Surrogate Key generator

Surrogate Key generator

Re: Surrogate Key generator