Implementing SCD type-II in DataStage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vimali balakrishnan
Participant
Posts: 60
Joined: Mon Dec 27, 2004 3:38 am

Implementing SCD type-II in DataStage

Post by vimali balakrishnan »

Hi,

I have to implement SCD type-II in DataStage.

I have a source table in which one column is considered as a key column which can be looked up.

For example,
consider the project master table, I have project_id,customer_id,customer5_id columns in my source.
my target table has project_id,customer_id,customer5_id,status columns.

I have to check if the project_id in the source is there in the target,if present then check if the other columns in the target is same as
the columns in the source i.e,if one record/row in the source is already present in the target.if same skip that record.

If the project_id is already present and if the other columns are not same as that of the source then insert this record into the target with status as 'A'
and update the status of the already existing record with the same project_id as 'I'.

Can u help me out with this????
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum for SCD and "slowly changing dimension". It has been covered a number of times.
And it's straightforward to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vimali balakrishnan
Participant
Posts: 60
Joined: Mon Dec 27, 2004 3:38 am

Re: Implementing SCD type-II in DataStage

Post by vimali balakrishnan »

Assume my source table has data in the following format:

Source table:
1110 1111 22222
1111 2222 33333
1111 3333 44444

Target table:
1110 1111 22222 'Active' 10-01-2003 10-01-2004
1111 2222 33333 'Active' 11-01-2003 11-01-2004

Now my job should see if 1110 is present in the target and the other two columns are same,so it is the same in the target,so it should skip that record.

It starts with the next record,checks if 1111 is present in the target,
and the other records are same,so it is the same in the target,so it should skip that record.

It starts with the next record checks if 1111 is already present,so it is already present in the target but the other values are not same so
my job should update the status of the current record(1111) in the target to 'Inactive' and insert the new record from the source with status as 'Active' and the End date of the previous record as start_date of this new record.

The problem i face is how to identify the record to be updated and then insert the new record.

Can u explain me with the constraints that should be used.












vimali balakrishnan wrote:Hi,

I have to implement SCD type-II in DataStage.

I have a source table in which one column is considered as a key column which can be looked up.

For example,
consider the project master table, I have project_id,customer_id,customer5_id columns in my source.
my target table has project_id,customer_id,customer5_id,status columns.

I have to check if the project_id in the source is there in the target,if present then check if the other columns in the target is same as
the columns in the source i.e,if one record/row in the source is already present in the target.if same skip that record.

If the project_id is already present and if the other columns are not same as that of the source then insert this record into the target with status as 'A'
and update the status of the already existing record with the same project_id as 'I'.

Can u help me out with this????
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Re: Implementing SCD type-II in DataStage

Post by prabu »

source Key-> lookup target Key


if targetkey is null (meaning this is a new record)
insert into target
else
-- target key is available in target
if NO ( checksummatch(all other columns other tahn status flag)) then
update target set status='Inactive'
insert new record with status as 'Active'
else
none
end if


please use this psuedocode to implement what you want
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you had searched the Forum, you would have determined that pre-loading a set of the current keys from the target table into a hashed file allows rapid and easy detection of existence, and can deliver the corresponding surrogate key values too.

On the basis of that selection, you can determine whether to insert a new row or not. In the latter case (there is already a record in the dimension table), you have to implement your own change detection logic - everyone's is different) in order to determine whether to insert a new row with the same business key but a different surrogate key.

Your job must maintain this hashed file, writing new key/surrogate key combinations to it as generated. You need to read from the hashed file with "lock for update" enabled.

To solve your problem about how to identify the record to be updated, you must analyse your actual need; which columns changing would trigger this event? Then implement logic to detect changes in those columns.

Note that, in Type 2 SCD, you NEVER update records in the dimension table. You always create new records, so that you can preserve history. That is the entire point of Type 2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply