Upsert Vs Insert/Update
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
Upsert Vs Insert/Update
Hi All,
We are developing jobs where we need to insert/update the DB2 tables.Can you please tell me in terms of performance whether an upsert method is useful or insert and update in two different stages.
We are developing jobs where we need to insert/update the DB2 tables.Can you please tell me in terms of performance whether an upsert method is useful or insert and update in two different stages.
Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.
If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.
If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.
ie. The cost of the original INSERT and the ROLLBACK are wasted time.
If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.
ie. The cost of the original INSERT and the ROLLBACK are wasted time.
If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
Ross Leishman
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
A simple insert gives me bulk loading facility.According to your suggestion it is better if we keep it in two separate stages.kcbland wrote:Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.
If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
Can you please elaborate your answer , sorry I didn't get you.rleishman wrote:Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.
If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.
ie. The cost of the original INSERT and the ROLLBACK are wasted time.
If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
If your dataset has rows that have to be handled as inserts and other rows that have to be handled as updates, you cannot BULK INSERT. You must separate the updates and use DML.
I assume you're trying to load as fast as possible. So, bulk loading inserts is the fastest method for inserts.
For updates, you can have a DS process read rows from a file and stream them to the database with a DML cursor , or, you can do a faster approach. Bulk loading the updates into a work table is faster than streaming them into a cursor. Parallel DML updating from the work table to the target table is faster than single thread DML updating from a cursor.
I assume you're trying to load as fast as possible. So, bulk loading inserts is the fastest method for inserts.
For updates, you can have a DS process read rows from a file and stream them to the database with a DML cursor , or, you can do a faster approach. Bulk loading the updates into a work table is faster than streaming them into a cursor. Parallel DML updating from the work table to the target table is faster than single thread DML updating from a cursor.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: