Page 1 of 1

What's the fastest way of doing an "exists in" or

Posted: Mon Jul 10, 2006 7:16 am
by davidf_wf1
I'm currently designing a data archiving solution using DataStage, and as part of the solution I'll need to perform the following logic:

If [primary key] exists in [archive database] then delete specified record

My question is twofold:
1) What is the best/fastest way in DataStage of performing this lookup validation? For example, would a lookup of a hash file be best, or is there something else I can do? My concern with the hash file solution is that the hash file itself would take a long time to create owing to the large data volume in the archive database.
2) What is the best way of performing the delete within DataStage? Custom SQL in a database component?

Thanks in advance
David

Posted: Mon Jul 10, 2006 7:54 am
by kcbland
Best and fastest are subjective. I can't believe your archiving solution isn't tied to the delete solution. If that's true, you're talking about full scan the live table and compare with the archive table, and where matched you'll delete from the archive table.

The fastest solution if these two tables are not in the same instance (you didn't specify your RDBMS) is to take all of the primary keys in the main table, spool them out, bulk load into a work table in the archive database, then do an inner join between the work table and the archive table and spool out the keys found. Then, bulk load those keys back into another work table in the main table instance. Now use a DML based delete SQL statement

Code: Select all

DELETE FROM MAINTABLE WHERE KEY IN (SELECT KEY FROM PURGETABLE)

If your tables are in the same instance, skip all of the jobs and just issue the above SQL except instead of PURGETABLE use ARCHIVETABLE.