Insert then Update & Update then Inser

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
cchavez2007
Premium Member
Premium Member
Posts: 3
Joined: Fri Jun 01, 2007 2:50 pm

Insert then Update & Update then Inser

Post by cchavez2007 »

Hello All,

In the ORACLE OCI EE stage there are two Upsert Order modes: Insert then Update & Update then Insert.

My question is, how do we improve the performance of a Parallel job for which the Upsert order is set to "Insert then Update" on a given day when there are more or all updates as compared to inserts. Reason for choosing Insert then Update in the first place is that there are more records to be inserted on any given day
as compared to Updates.

My understanding is the stage always tries to insert, if that fails then do an update in an Insert then Update mode and vice versa for Update then Insert. The example below illustrates when Insert then Update gives us
poor performance.

Example:
Day #_of_Recs Inserts Updates Performance
1 10000 10000 0 Best
2 10000 8000 2000 Average
3 10000 2000 8000 Poor

Also would like to get some info on as to how DataStage PX handles Insert then Updates or vice versa i.e how the Oracle EE OCI stage interacts with the Database when it comes to inserting/updating recods. We have also noticed that the performance of a server job in a similar situation is better than a parallel job.

Thanks a lot in advance......
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whichever is named first is tried first then, if it fails because of a primary key violation, the other is tried.

For best performance (fastest finish) split the data into two separate streams - the rows to be inserted and the rows to be updated. Use direct write to handle the former, use "Update only" as the upsert mode for the latter.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cchavez2007
Premium Member
Premium Member
Posts: 3
Joined: Fri Jun 01, 2007 2:50 pm

Re: Insert then Update & Update then Inser

Post by cchavez2007 »

Hello Ray,

Thanks a lot for the prompt reply.

I undertand that splitting into two streams, insert and update, will give us best performance but since the target table is too large, I chose not to take that approach. Since also I am aware that the data contains more inserts at any given time, I would rather that the database handle it for me by selecting Upsert order, Insert then Update.
But I would like to know which performs better "Insert then Update" or "Update then Insert" to handle situations on certain days for some reason when I get more updates than inserts.

Thanks..
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I disagree about your assertion about "too large" - any lookup will be aided by the primary key index, or you can pre-build something like a Lookup File Set.

As to your actual question, you just have to wear the slight difference in efficiency on those days where there are more updates than inserts.

Of course, if you know in advance which days these are, you could have two versions of the job (one with each rule) and run the more appropriate one.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cchavez2007
Premium Member
Premium Member
Posts: 3
Joined: Fri Jun 01, 2007 2:50 pm

Post by cchavez2007 »

Hello Ray,

Thanks a lof for the valuable information.
Post Reply