how and which stages are used to main version in SCD 2

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

how and which stages are used to main version in SCD 2

Post by rafik2k »

Hi all,
i want to implement SCD 2 in parallel job. i have tried using
change capture and change apply stage, but using these stage i am not able to maintain version control.
my requirement is that if
1) new record in the source -> then insert it into target with new row id
2) changed record in the source -> then insert this record into the target as well as update the exisiting record with new time(end time) to main history.

Any help would be greatly appreciated. Thanks!

rafiq
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Rafiq,

The inputs to the CDC should be your source and target. The output of the CDC should lead to a transformer. Have 2 constraints in the transformer

1. change_code = 1 or change_code= 3
2. change_code = 3

From the transformer use the 1st constraint and lead the records to a funnel. From the transformer use the 2nd constraint and lead the records to a join stage and join with the target records. The output of the join stage should go to the same funnel. Now the output of the funnel will have both inserts and updates.

HTH
--Rich
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

personally I would not funnel. When you have inserts and updates split, keep them split up. It will improve performance .
Off cource dependant an DB and a lot of other parameters.
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

Post by rafik2k »

Hi Rich,
thanks for ur quick response
As per your solution i went thrugh it. But whenever there is any insertion or updation in the source file(text file), the conditions in the transformer become false. when i give condition like change_code=2 then it become false?
which is i suppose for deletetion. so what may be the reason?

my requirement is as
suppose i have source data like
NO NAME SAL
10 ABC 100
20 XYZ 200
30 CBA 300

and in the target i should have for the first time populate like this

SURROGATE_KEY NO NAME SAL VERSION
1 10 ABC 100 1000
2 20 XYZ 200 2000
3 30 CBA 300 3000


now for the next load, in the souce file gets changed like

NO NAME SAL
10 ABC 100
20 XYZ 200
30 XXX 300(changed in name)
40 SSS 400(new row)

for the next load, the target should have records like

SURROGATE_KEY NO NAME SAL VERSION
1 10 ABC 100 1000
2 20 XYZ 200 2000
3 30 CBA 300 3000
4 30 XXX 300 3001(increment corr version by 1)
5 40 SSS 400 4001


any solution or help will be greatly appreciated

Thanks in advance
Rafiq
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Rafik,

the basic approach is correct, but something seems to not be working correctly in your case. Could you also output the change_code value and see what it is for each row?
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Rafiq,

I think the problem might be with the link ordering in Change Capture stage. As Arnd suggested write the output of the CDC to a peek stage and check the value for change_code column. If the results are reversed then the link order should be changed in Change Capture stage.

HTH
--Rich
rafik2k
Participant
Posts: 182
Joined: Wed Nov 23, 2005 1:36 am
Location: Sydney

how to maintain version number in SCD 2

Post by rafik2k »

Thanks Arnd and Rich,
The problem has been resolved after changing the link order.
Now my problem is how to maintain the version number of each record in the target as i mentioned in the last message.
source as well as target are text files

any solution?

Thanks in advance

Rafiq
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

AFAIK the only 2 operations supported by sequential files are append and overwrite.

The sequential file stage has no mechanism for updating previously written rows. if it is imperative that output be a text file, you will need to design logic that re-creates the entire file.


One thing that comes to mind is separate the updates and deletes from the inserts and copies and then lookup the original file(Stream) against the updates(Reference) (reversed this intentionally), update the end date and recombine these records with the inserts and copies and build a new file with all the records.

IHTH
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
laknar
Participant
Posts: 162
Joined: Thu Apr 26, 2007 5:59 am
Location: Chennai

Generating Surrogate Key as parameter

Post by laknar »

Hi,

can any one suggest me the Design of SCD2.

how can i generate Surr Key for Next load.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are at least five different ways for generating surrogate keys which you would have discovered had you bothered to Search rather than hijacking a thread that is more than two years old.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

Code: Select all

1.Just perform a join on the natural key columns and seperate the new records from the old ones
2.In the next step assign the surrogate keys to the new records. 
3.Third would be the update.
.

P.S Are my DWH concepts outdated or your talking about type 3 and not 2 here because i thought in SCD type 2 a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
altruist
Participant
Posts: 73
Joined: Thu May 11, 2006 6:50 am

Post by altruist »

Sorry to bring up the old post. But this is the closest that I found from my search.

I am planning to implement SCD and obtaining three files Insert, UpdateType2, UpdateType1. Using which I update the Target Table. However I have versioning in Target Table (in addition to it, I want to update based on surrogate keys due to the huge number of natural keys, which eventually would be the where clause while updating instead of surrogate key).

Is the below approach approach a good solution or do we implement SCD type 2 for versioning similar to that of Server Edition.
As with the below solution I don't see that the CDC stage is being used efficiently.

AFAIK the only 2 operations supported by sequential files are append and overwrite.

The sequential file stage has no mechanism for updating previously written rows. if it is imperative that output be a text file, you will need to design logic that re-creates the entire file.


One thing that comes to mind is separate the updates and deletes from the inserts and copies and then lookup the original file(Stream) against the updates(Reference) (reversed this intentionally), update the end date and recombine these records with the inserts and copies and build a new file with all the records.

IHTH
Gladiator1
Participant
Posts: 3
Joined: Tue May 05, 2009 11:57 pm

Post by Gladiator1 »

If someone has implemented a SCD 2 in DataStage using CDC Stage.It would be helpful if they could put a screenshot of the job.
Post Reply