SCD Common Multi Instance Job

Azzuri · Post by **Azzuri** » Mon Feb 23, 2009 8:05 am

Hi All, we are discussing best approach for handling SCDs. I'm thinking of creating a Multi-instance job where the following will be passed as
parameters

Table Name (Dimension table to update)
Schema
File Name (Dimension Data from Input)
File Name (Existing Dimension Data)

There will be more parms but the above is a start. There has been discussions about handling the SCD through stored procedures.

Just wondering if anyone else has built a common process to handle SCDs and what they did.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 23, 2009 2:40 pm

Get version 8 and use Slowly Changing Dimension stage, which does it all. (Well, in conjunction with Surrogate Key Generator stage.)

Azzuri · Post by **Azzuri** » Mon Feb 23, 2009 3:23 pm

Hi Ray, we talked about version 8 but it looks like it's still a few months away.

richdhan · Post by **richdhan** » Tue Feb 24, 2009 1:37 am

Hi,

Check this link. viewtopic.php?t=97043. You should use the CDC for finding the delta values. Use the Lookup stage in conjunction with Column Generator stage to get the maximum surrogate value and to generate new surrogate key values.

HTH
--Rich

Mike · Post by **Mike** » Tue Feb 24, 2009 6:34 am

Or stall until you get your hands on version 8x...

The SCD stage is extremely powerful. It handles any type 1 or type 2 variation that I've encountered in the past. It doesn't do type 3, but I find type 3 to be an infrequent requirement... probably was not enough of a business case for IBM to invest in supporting type 3.

Mike

vmcburney · Post by **vmcburney** » Tue Feb 24, 2009 3:51 pm

I have helped build a common SCD container in DataStage Server edition. The Type 2 fields were packed into a single text field with delimiters and the key fields were packed into another field, the two fields were passed into a container where the key was looked up, the type 2 fields were compared and inserts were generated with tagging fields added. If you are adventurous you can use the CRC32 function instead of a packed type 2 field (and risk the false positives). I don't think this approach works so well in parallel jobs. A server job made it easy to pack and unpack the fields.

You could try building a custom stage or just wait until you are ready to move to 8.1.