Upsert Vs Insert/Update

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Upsert Vs Insert/Update

Post by pravin1581 »

Hi All,

We are developing jobs where we need to insert/update the DB2 tables.Can you please tell me in terms of performance whether an upsert method is useful or insert and update in two different stages.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
Ross Leishman
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

kcbland wrote:Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
A simple insert gives me bulk loading facility.According to your suggestion it is better if we keep it in two separate stages.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

rleishman wrote:Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
Can you please elaborate your answer , sorry I didn't get you.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If your dataset has rows that have to be handled as inserts and other rows that have to be handled as updates, you cannot BULK INSERT. You must separate the updates and use DML.

I assume you're trying to load as fast as possible. So, bulk loading inserts is the fastest method for inserts.

For updates, you can have a DS process read rows from a file and stream them to the database with a DML cursor , or, you can do a faster approach. Bulk loading the updates into a work table is faster than streaming them into a cursor. Parallel DML updating from the work table to the target table is faster than single thread DML updating from a cursor.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

pravin1581 wrote: Can you please elaborate your answer , sorry I didn't get you.
There's more on upserts in thisthread
Ross Leishman
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I just added that thread to my Favorites. That was great information.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

kcbland wrote:I just added that thread to my Favorites. That was great information.
Can anyone help me out on this .
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

They already have. This thread and the one referred to have excellent, best practice advice.

Separate your inserts from your updates.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply