What's the fastest way of doing an "exists in" or

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
davidf_wf1
Participant
Posts: 3
Joined: Mon Jul 03, 2006 9:03 am

What's the fastest way of doing an "exists in" or

Post by davidf_wf1 »

I'm currently designing a data archiving solution using DataStage, and as part of the solution I'll need to perform the following logic:

If [primary key] exists in [archive database] then delete specified record

My question is twofold:
1) What is the best/fastest way in DataStage of performing this lookup validation? For example, would a lookup of a hash file be best, or is there something else I can do? My concern with the hash file solution is that the hash file itself would take a long time to create owing to the large data volume in the archive database.
2) What is the best way of performing the delete within DataStage? Custom SQL in a database component?

Thanks in advance
David
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Best and fastest are subjective. I can't believe your archiving solution isn't tied to the delete solution. If that's true, you're talking about full scan the live table and compare with the archive table, and where matched you'll delete from the archive table.

The fastest solution if these two tables are not in the same instance (you didn't specify your RDBMS) is to take all of the primary keys in the main table, spool them out, bulk load into a work table in the archive database, then do an inner join between the work table and the archive table and spool out the keys found. Then, bulk load those keys back into another work table in the main table instance. Now use a DML based delete SQL statement

Code: Select all

DELETE FROM MAINTABLE WHERE KEY IN (SELECT KEY FROM PURGETABLE)

If your tables are in the same instance, skip all of the jobs and just issue the above SQL except instead of PURGETABLE use ARCHIVETABLE.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply