Page 1 of 1

Need some info on logic

Posted: Thu May 06, 2010 3:43 am
by sourabhverma
Hi,

We have one parallel job which runs on monthly basis and extracts source data (100 mn records approx) and applying a lookup on that data with 2 reference tables, one is having 5 fields and other one is having 15 fields and propogating all 20 fields along with main stream fields into target table.

The requirement is that whenever there is any kind of change in any of the reference tables, the same changed data should also be refelected in the target table and this should be happening with out re-extraction of the source data.

Could anybody give an idea on doing this, any help is appreciated.

Posted: Thu May 06, 2010 4:19 am
by priyadarshikunal
update manually :wink:

do you need to update the values on records loaded previously?

Posted: Thu May 06, 2010 4:26 am
by sourabhverma
Yes, The changed values should be updated in the final target table also.

Posted: Thu May 06, 2010 4:33 am
by Sainath.Srinivasan
Is it a full extract from the source ?

If Yes, would'nt it be a full refresh (apart from deletes) ?

What will happen if your update fails in the middle ? You will need some type of control.

Posted: Thu May 06, 2010 4:34 am
by priyadarshikunal
Sainath already asked the questions i was about to, need answers to suggest further.

This is the kind of scenario where I would put reference table to target database (SCD type 1) and use its surrogate key to the target table. So in case of any change is reference data is automatically reflected to all records. And probably thats why the star schema was invented.

This should be a question for the person designed the database as there is possibility of misunderstanding.

Posted: Thu May 06, 2010 5:04 am
by sourabhverma
Basically idea behind this requirment -> As this is monthly job, the data will be loaded into target table (appended) on monthly basis. So let`s say after loading this data, if there is any indication that the base ref data is somehow not perfectly correct so after getting the corrected ref data table, the job must be running which should use the new ref data & then finally populate the targte table.

I could actually store the extracted data first into dataset or temporary table and whenever required i could run the job but because of high data volume we don`t want to do this way.

Posted: Thu May 06, 2010 8:28 am
by kumar_s
If you planing to have the verification manually, you can better have a seperate flow for rectification.
Just to update the corrected reference records in Target.
Or as mentioned, update manually.

Need some info on logic

Posted: Fri May 07, 2010 6:48 am
by ajay.vaidyanathan
Hi,
Use a CDC... I guess you are referring to Change Data Capture Logic.... Try using a CDC and let me know if your issue got resolved or not....