SCD Common Multi Instance Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

SCD Common Multi Instance Job

Post by Azzuri »

Hi All, we are discussing best approach for handling SCDs. I'm thinking of creating a Multi-instance job where the following will be passed as
parameters

Table Name (Dimension table to update)
Schema
File Name (Dimension Data from Input)
File Name (Existing Dimension Data)

There will be more parms but the above is a start. There has been discussions about handling the SCD through stored procedures.

Just wondering if anyone else has built a common process to handle SCDs and what they did.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Get version 8 and use Slowly Changing Dimension stage, which does it all. (Well, in conjunction with Surrogate Key Generator stage.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

Post by Azzuri »

Hi Ray, we talked about version 8 but it looks like it's still a few months away. :(
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

Check this link. viewtopic.php?t=97043. You should use the CDC for finding the delta values. Use the Lookup stage in conjunction with Column Generator stage to get the maximum surrogate value and to generate new surrogate key values.

HTH
--Rich
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Or stall until you get your hands on version 8x...

The SCD stage is extremely powerful. It handles any type 1 or type 2 variation that I've encountered in the past. It doesn't do type 3, but I find type 3 to be an infrequent requirement... probably was not enough of a business case for IBM to invest in supporting type 3.

Mike
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I have helped build a common SCD container in DataStage Server edition. The Type 2 fields were packed into a single text field with delimiters and the key fields were packed into another field, the two fields were passed into a container where the key was looked up, the type 2 fields were compared and inserts were generated with tagging fields added. If you are adventurous you can use the CRC32 function instead of a packed type 2 field (and risk the false positives). I don't think this approach works so well in parallel jobs. A server job made it easy to pack and unpack the fields.

You could try building a custom stage or just wait until you are ready to move to 8.1.
Post Reply