| Author |
Message |
satyasur
Participant
Joined: 07 Apr 2006
Posts: 15
Points: 135
|
|
| DataStage® Release: 7x |
| Job Type: Server |
| OS: Windows |
|
Hello all,
I am trying to figure out a strategy with which we can handle DELETES in our data warehouse. The situation is a follows:
In our data warehouse we load the data incrementally in the staging. This means we have rows updated or inserted depending on whether they are modified or new. The source system application allows users to delete rows. Take for example: If a product is retired the customers can delete it from the source system’s master table. So if a report is generated on the application system the n that product does not show up. Now they want the data to be consistent in the DW too.
If a record is deleted from the source system we do not have a method as of now to determine it. So we miss those rows and they are still persistent in our DW.
We are not actually thinking of deleting those master rows or transaction rows from our DW.
Any inputs on how to handle this situation are most welcome.
Are there any generic tools (database independent) which can do this (CDC)? Please suggest.
Thanking you in advance.
Satish.
|
|
|
|
|
 |
Madhav_M
Participant
Joined: 10 Jul 2004
Posts: 43
Points: 355
|
|
|
|
|
|
Hi Satish
This defeats the purpose of having datawarehouse in place!! Ok.. if you want to keep your target intact with source.. only way is reload the target on a defined frequency!! BTW what is your load frequecy?
|
|
|
|
|
 |
satyasur
Participant
Joined: 07 Apr 2006
Posts: 15
Points: 135
|
|
|
|
|
|
hello sir,
I have mentioned that we do not intend to actually delete the records.
We plan to mark records with a flag, say 'D' for deleted.
we have huge tables with millions of rows in them.
so we cannot afford to load them every time.
so we are looking at CDC kind of an approach.
we have a constraint of not putting the operational system( source application ) on pressure by writing triggers.
also we want the solution to be generic( in terms of database as we have different DBs (oracle,DB2,MS SQL server).
That was the reason I wanted to know if any of you have used any tool to perform such a task.
Best regards,
Satish.
|
|
|
|
|
 |
ray.wurlod
Participant
Group memberships: Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group
Joined: 23 Oct 2002
Posts: 49911
Location: Canberra, Australia
Points: 271276
|
|
|
|
|
|
|
You're on the right track. Once you detect that a source record has been deleted (and there's lots of ways to do that), simply update the "active" indicator in the Data Warehouse.
|
_________________ RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong
Delivering Outcomes Together |
|
|
|
 |
satyasur
Participant
Joined: 07 Apr 2006
Posts: 15
Points: 135
|
|
|
|
|
|
hello Ray sir,
Thanks for the motivation :)
I was interested in knowing the first part : detecting the deleted rows.
I had come across CDC as a viable method.
So wanted to know more about it.
any place you can point out where I can find some related reading about its implementation?
Best regards,
Satish.
|
|
|
|
|
 |
ray.wurlod
Participant
Group memberships: Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group
Joined: 23 Oct 2002
Posts: 49911
Location: Canberra, Australia
Points: 271276
|
|
|
|
|
|
|
There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can l ...
|
_________________ RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong
Delivering Outcomes Together |
|
|
|
 |
satyasur
Participant
Joined: 07 Apr 2006
Posts: 15
Points: 135
|
|
|
|
|
|
| ray.wurlod wrote: |
| There are many different kinds of CDC (change data capture). Ideally the database itself will maintain a log of deleted records, perhaps via a trigger or through the application itself, and you can l ... |
well yes I have come across an article which says that one of the efficient methods to do CDC is by reading the transaction logs. any tools you know of which do it really well?
If I start writing my own code I will have to write it for all the DB's. That is why i wanted to know about the existing tools.
any inputs on IBM data mirror?
P.S. thanks for all your inputs Ray sir :)
cheers,
Satish.
|
|
|
|
|
 |
|
|