Upsert Vs Insert/Update

pravin1581 · Post by **pravin1581** » Tue Nov 20, 2007 5:15 pm

Hi All,

We are developing jobs where we need to insert/update the DB2 tables.Can you please tell me in terms of performance whether an upsert method is useful or insert and update in two different stages.

kcbland · Post by **kcbland** » Tue Nov 20, 2007 5:44 pm

Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.

rleishman · Post by **rleishman** » Tue Nov 20, 2007 8:28 pm

Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.

pravin1581 · Post by **pravin1581** » Wed Nov 21, 2007 9:29 am

kcbland wrote:Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.

A simple insert gives me bulk loading facility.According to your suggestion it is better if we keep it in two separate stages.

pravin1581 · Post by **pravin1581** » Wed Nov 21, 2007 9:32 am

rleishman wrote:Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.

Can you please elaborate your answer , sorry I didn't get you.

kcbland · Post by **kcbland** » Wed Nov 21, 2007 2:33 pm

If your dataset has rows that have to be handled as inserts and other rows that have to be handled as updates, you cannot BULK INSERT. You must separate the updates and use DML.

I assume you're trying to load as fast as possible. So, bulk loading inserts is the fastest method for inserts.

For updates, you can have a DS process read rows from a file and stream them to the database with a DML cursor , or, you can do a faster approach. Bulk loading the updates into a work table is faster than streaming them into a cursor. Parallel DML updating from the work table to the target table is faster than single thread DML updating from a cursor.

rleishman · Post by **rleishman** » Wed Nov 21, 2007 3:51 pm

pravin1581 wrote: Can you please elaborate your answer , sorry I didn't get you.

There's more on upserts in thisthread

kcbland · Post by **kcbland** » Wed Nov 21, 2007 5:30 pm

I just added that thread to my Favorites. That was great information.

pravin1581 · Post by **pravin1581** » Mon Nov 26, 2007 2:13 pm

kcbland wrote:I just added that thread to my Favorites. That was great information.

Can anyone help me out on this .

ray.wurlod · Post by **ray.wurlod** » Mon Nov 26, 2007 2:24 pm

They already have. This thread and the one referred to have excellent, best practice advice.

Separate your inserts from your updates.