Oracle Performance Tuning - Bulk / Direct / OCI / Updates

rleishman · Post by **rleishman** » Tue Sep 27, 2005 4:29 am

There's quite a few interesting posts in this forum that deal with performance issues writing data to Oracle. I've done some benchmarking lately, and thought I'd drag together all that old info and bring it up to date. Ideally, I'd like this to become the go-to thread for Oracle loading performance issues.

Apologies in advance to those who don't like new threads for old problems.

First a disclaimer: Benchmarks are notoriously system dependent; what is true for me may not be true for you. I have tried to avoid this in my conclusions, but ultimately it is inescapable.

My environment:
Datastage Server: DS 7.5.1a on SuSE Linux Enterprise Server 9 (i586)
Database Server: Oracle Enterprise Edition Release 10.1.0.4.0 on SuSE SLES 8 (Linux).

With my release of DS, there are 3 Oracle stages available:
- Oracle OCI (I'll call it OCI from here in) - can select/insert/update/delete from Oracle tables
- Oracle OCI Load (I'll call it OCI Load from here in) - can only insert (ie. load) Oracle tables
- Oracle 7 Load (I'll call it OraBulk from here in) - can only insert (ie. load) Oracle tables.

[28Sep05] Pursuant to responses made by ian_bennett below, I have edited the paragraphs below replacing references to "array size" with "transaction size", as the original text was misleading. Edits in blue.

When writing back to Oracle from DataStage, there are five important concepts to understand: Transaction Size, Direct/Conventional Path, Update Action, Load Mode, Local/Remote sessions.

Transaction Size
Transaction Size refers to the number of rows you group together to post to the database in one hit. It relates primarily to INSERTs on the OCI stage. A transaction size of 1 means that you are sending one row at a time to the database; 250 means 250 rows at a time.
Clearly, a larger transaction size means fewer round trips and a faster job; you are really only restricted by memory. Performance benefits tend to tail off at higher numbers, and there are some tricky issues with values over 255 relating to the MAXTRANS setting on Oracle tables and indexes. I have found 255 to be a fairly efficient setting.

Direct/Conventional Path
Direct Path load was introduced in Oracle 7 as a faster way to perform bulk inserts. Rather than looking for free space in blocks that already contain rows (or once contained rows), Oracle just goes straight to the first never-used block and starts inserting. There are lots of restrictions and considerations though - it is very important to read the Oracle Utilities manual so that you understand the consequences of Direct Path loads before you start using them. Common consequences are tables being left in an unusable state, slower (yes! slower) load in some cases, and huge increases in disk usage if you don't know what you are doing.
The alternative of Direct Path is Conventional Path - it looks for free space and uses it. The OCI stage always uses Conventional Path. It is very safe compared to Direct Path.
There has been some discussion over whether OCI Load uses Direct or Conventional path. I am here to tell you that the OCI Load stage on DS 7.5 DOES use Direct path. I have monitored the database whilst OCI Load was running and confirmed it as true (which is what the manual says, so that's encouraging).
I am not using the OraBulk stage, but from what I gather you have to write your own after-stage-subroutine to kick off SQL*Loader, which gives you the power to select direct or conventional path as you see fit. I can't see how Conventional Path would be any better than an OCI stage, but you might want to defer your load to avoid Snapshot Too Old errors and the like (see Oracle Forums).

Update Action
Update Action is an OCI stage concept. The Update Action instructs DS what statements to send to Oracle. eg. Insert Rows Without Clearing sends only INSERT statements. Insert New Rows or Update Existing sends an Insert, and if that fails, an update. You get the idea.
Now this is really important: If you set the Transaction Size to (say) 255 and the Update Action to Insert Rows Without Clearing, then Oracle will send only one INSERT statement for every 255 rows. Efficient huh? However, if you set the Update Action to Insert New Rows or Update Existing, then DS has to check every single row to see if it exists, and if not send an UPDATE. The result is that your Transaction Size setting is completely ignored! If 10% of incoming data already exists, then you will send 255+25=280 statements to the database compared to just one for Insert Rows Without Clearing.
As a rule, NEVER use "Insert New Rows or Update Existing" or "Update Existing Rows or Insert New" for data volumes exceeding a few hundred. Every single one of my Jobs that use an OCI to insert/update compares the incoming row to the existing rows on the database. New rows go to an Insert New Rows Without Clearing OCI, updated rows go to an Update Existing Rows Only OCI, and unchanged data goes to a discard file (you could just not output unchanged rows, but I like completeness).
The discard file is actually important. You could just not bother checking and update it anyway, right? I load a file that is a full dump of Product data every day - it has 50000 rows of which only a handful are changes/inserts. Updating every row processes at up to 150 rows per second. Filtering unchanged rows processes over 2000 rows per second.

Load Mode
Load Mode is a setting on the OCI Load stage. Set to Automatic, it uses Direct Path Load to load the data whilst it being transformed. Set to Manual, it writes to a file and you have to load it yourself with SQL*Loader (or DataPump if you want to roll your own). Manual is the equivalent to the OraBulk stage without the convenience of an after-stage-subroutine - I don't know why you'd use it.
The Automatic setting is the big selling point for OCI Load. I mean, why wouldn't you want to load it asynchronously? Asynchronous loading has to be faster than synchronous, right?
Not necessarily. More below.

Local/Remote Sessions
When your database lives on your DataStage server, DataStage is able to connect Locally to the database. When Oracle is on another machine, DS must make a Remote connection across Oracle Net (formerly SQL*Net). This is most noticable in bulk load. Local Direct Path loads are the absolute fastest way to get data into Oracle - no correspondence will be entered into. The performance difference between local and remote connections will depend on many factors, but orders of magnitude are not uncommon. Inexplicably, the time difference tends to be greater than the time it takes to FTP the same data file between the machines

Clearly, if you use a Load Mode of Automatic and your Oracle server is remote, then you will get a remote session. The good news is that if DS and Oracle are on the same box then Automatic will use a Local session to asynchronously load data as it is transformed. Chuck in some Inter-Process stages and you couln't wipe the smile off my face with a shotgun.
Not all is lost though if you have a remote database. You can use Manual load (or OraBulk) to drop the output to a file, then a Routine to copy it over to the database server and issue a remote command to start the load with a Local connection. If you're really clever about it you could get this working asynchronously rather than waiting for the transformation to complete, but this would be much harder for the next guy (who's not quite so clever) to support it.

OK on to the results.

I took a standard transformation from my system and adapted it to run in a variety of different ways. It is active from source to target, so there are no intermediate files to mess up the averages, and it is not so functionally complex that it is CPU bound (ie. It keeps up with the reads and writes reasonably well).
I fed it with Approx 133000 rows, of which only a handful were UPDATEs, the rest were all new. Since I didn't want to use and Update Action of Insert New Rows Else Update Existing, I transformed the data to an empty copy of the real target table, and then issued an Oracle MERGE /*+ APPEND*/ statement, which inserts new rows Direct Path, and updates existing rows Conventional Path in a single statement. Incidentally, this is the standard for high volume upserts in this warehouse.

Going from worst to best:

OCI stage with Update Action of Update Else Insert processed 14 rows per second. I was so disgusted I didn't let it finish. Clearly I didn't need the MERGE at the end with this method, but it didn't get that far to matter.

OCI stage with Update Action of Insert Else Update processed 38 rows per second. Faster because it does an INSERT first, and only does an UPDATE if the row exists. Most of my data was INSERTs.

OCI stage with Insert New Rows Only and a Transaction Size of 255 processed 100 rows per second. The MERGE takes 433 seconds, giving an average speed of 75 rows per second.

OCI Load with Automatic Load Mode and Max Record Number (equiv of Transaction Size for Loads) of 255 processed 127 rows/sec to a remote database. With the MERGE the average came down to 90 rows per second.

OCI Load with Manual Load Mode processed 195 rows per second. The final file transferred to the database server in 8 seconds, loaded (Direct Path Local) in 83 seconds, and MERGEd in 433 seconds. Average: 110 rows per second.

I was unable to benchmark an OCI Load with Automatic Mode on a local database. Sorry.

My recommendations are as follows:

1. For dimension (reference) data, always join the source to the target, discard unchanged rows, separate INSERTs and UPDATEs into two OCI stages. Direct Path load is unsuitable for dimensional data, because the % changes to the table is usually so small (see Oracle manuals for more).
You can compare the source to target by dropping the target table down to a Hash table, or join it with SQL by using Oracle Externally Organised Tables to access your incoming file.
Never use an OCI stage as a Lookup Input to a Transformer - it performs a round trip to the database for each input row. PX users may ignore this tip, as they have a special stage that caches the lookup.

2. For very low volume fact data (<100 rows), you can use an OCI stage with Insert Else Update. Direct Load is not appropriate for such low volumes, and it's just not worth the bother to do a MERGE (see above) or join source to target.

3. For medium volume fact data (100-10000 rows), use an Insert Only OCI stage to achieve Conventional Path load. OCI Load will use Direct Path, which is inappropriate when the incoming volume is such a small percentage of the existing data. INSERT into an empty table, and MERGE (without APPEND) into the real target table.
If you know your data to be INSERTs only, then ignore the intermediate table.

4. For large volume Fact data on a local database, use OCI Load in Automatic mode to load an empty table, and MERGE /*+APPEND*/ into the real target.
If you know your data to be INSERTs only, then ignore the intermediate table.

5. For large volume Fact data on a remote database, use OCI Load in Manual mode or OraBulk to drop the data to a file and generate a SQL*Loader control file; move the control file and data to the Oracle server, and Direct Path load the file to an empty table and MERGE /*APPEND*/ to the real target. Again, if you know your data to be INSERTs only, then ignore the intermediate table.

Perhaps this sounds like too much effort? How concerned about performance are you?

Personally, I have implemented a compromise.
* I always follow #1, #2, and #4
* I use solution #4 in place of #3, because I am prepared for small Fact transformations (#3) to take a little longer with Direct Path, and I don't want to create too many options for programmers (the more options, the more likely the wrong choice).
* I use solution #4 in place of #5, because I am lazy and don't want to go to the extra effort of copying and remote shells - Automatic Load is just sooooo easy. Having said that, I don't have any data volumes that would really punish me for this decision. If I was working with 500,000+ row files then I'd get with the program.

A better compromise would be to use put in the infrastructure to automate #5, and use it in place of #3 and #4. With some smarts, you could get it to use Conventional Path for low volumes and Direct Path for high volumes. Then you have one-solution fits all for Fact data.

Happy tuning :D

dzdiver · Post by **dzdiver** » Tue Sep 27, 2005 10:35 am

There are two other techniques that may be appropriate to loading Oracle, depending on your warehouse design and the data you need to load.

1) External tables
Similar to sqlldr, but with additional ability to transform and perform queries on the data while loading. Additionally, a parallel option exists which does not require splitting staged files.

2) Partition Exchange loading
If you have very large partitioned tables, thi scan be used to very quickly load empty partitions then exchange them into existing tables using only a data dictionary update so this is very fast.

Maybe you can do some benchmarking on your platform and let us see how you get on?

HTH,
B.

rleishman · Post by **rleishman** » Tue Sep 27, 2005 5:12 pm

Great tips, thanks.

These techniques are a bit distant from my current methodology to easily benchmark them, but I can make some pretty accurate guesses based on the figures I already have.

Externally Organised Tables (EOTs)

Using EOTs involves transforming to a flat file, and then using what is effectively an internal SQL*Loader script stored on the database to read it into Oracle. The technique is comparable to the #5 technique described above.
* Transform to a flat file (195 rows/sec in my example)
* Move the flat file to the database server where it can be referenced as an EOT (8 seconds in my example)
* Run a MERGE /*+APPEND*/ or INSERT /*+APPEND*/ statement to move the data from the EOT to the real target table. This process wil perform the equivalent of a Local Direct Path Load (83 seconds in my example) plus the MERGE (433 seconds) but will perform the two concurrently. Best case 433 sec, likely worst case 516 sec.

This technique has the potential to outperform method #5 for Upserts above, as it performs the load and merge concurrently. It would not outperform Insert-Only transformations which could be loaded directly to the target without need of an EOT or a temporary staging table.

Partition Exchange Load (PEL)

PEL can be used when the output of your Job is a complete replacement for a subset (usually a date range - day or month) of data in the target table. It is less commonly used for a few reasons:
* It takes some effort to set up cleanly
* Data does not always arrive in the form of a full-replacement for a single date range.
* When data does arrive as a replacement for a date-range, it is usually a single day, whereas most warehouses will partition monthly or yearly.

Having said that, when PET is possible it is the fastest way to replace data in the database so is worth consideration.

Let's assume that the incoming data file is an entire month or entire day. This is how we could proceed:
* Transform the data to a flat file (195 rows/sec in my example)
* Move the file to the database server (8 seconds)
* Local Direct Path Load into an empty table (83 seconds)
* Build indexes and RI on the new table, and exchange it with the existing partition for the same date range in the target table (no benchmark, but certainly less than 433 seconds taken for MERGE /*+APPEND*/ in my example. Experience suggests a likely time for my data would be perhaps half to 3/4 of this number - I currently have RI switched on).

In the right circumsances, PET could outperform the lot - especially when tranforming to very large tables. I have tried and failed to implement it as a standard in past because it is so rare for all the conditions for its practical use to align.

aartlett · Post by **aartlett** » Tue Sep 27, 2005 6:00 pm

Ross,
Great tips. I think one of the areas you have missed, and is critical to performance, is th holistic approach. No, I'm not going wierd (I was born that way).

You have $100,000 of software, probably running an investment in man hours of about the same running on two probably $20K machines. You have gone through most of the steps to ensure that your datastage/oracle is as tight as you can go. I hope you haven't sacrificed maintainability for speed, but that's another story.

One of the major slow points in any distributed system is the network. If you don't have a dedicated GB speed or faster network between your ETL and DB servers you will take a hit. The other areas to look at are: memory of each machine, is it as much as you can get, and the hard drives. I set up a similar system to yours, using 7.1 and Teradata and used raided SATA drives. SATA was the compromise between really fast SCSI and space. The Teradata database did reside on a raided SCSI as it was optimized for lots of smaller nodes (drives) on the box.

vmcburney · Post by **vmcburney** » Tue Sep 27, 2005 7:05 pm

Excellent thread.

Did you create your bulk load dat files via an Oracle stage? You can usually get faster performance by sending zero rows to the Oracle load stage and sending all your data rows to a sequential file stage. This gives you a faster dat file and a complete control file.

Your stats shows the importance of seperating inserts and updates rather then using the insert/update or update/insert combination.

ray.wurlod · Post by **ray.wurlod** » Tue Sep 27, 2005 7:53 pm

Tune the control file. Don't use the Bulk Loader stage type at all - use a Sequential File stage and only create the DAT file(s).
Add hints to the control file, for example for parallel load.
Add appropriate buffer sizes to each column, rather than use the default 256-byte buffers.

I have reduced a four hour load to 20 minutes using these techniques where the row comprised mainly integer values.

rleishman · Post by **rleishman** » Tue Sep 27, 2005 7:59 pm

I was hoping for some spirited discussion, and I reckon I got it...

Firstly aartlett:
We havent found maintainability to be a problem. The splitting of inserts/updates is all templated, so there is no development overhead. The only issues I've found are:
* If a table structure changes you have to refresh the columns for two OCI stages instead of 1.
* If you introduce new columns to a Dimension transformation, you have to update the expression that checks whether the row is changed. Coming from Informatica where this is automated was a rude shock.

We have a design standard that all derivations are performed in Stage Variables rather than in the output links, so you never have the same derivation coded twice. The update and insert links have exactly the same expressions (ie. refs to stage vars) so you just drag the expressions from the insert link to the update link.

The hardware/network thing is important, but also a topic for another thread. My intent here was to demonstrate a best practice assuming you cannot change the environment (which I can't).

vmcburney:
I didn't realize there was a performance hit in sending data to a file via Manual Load as opposed to SEQ File stage. My server is down, but I'll benchmark SEQ File ASAP.

aartlett · Post by **aartlett** » Tue Sep 27, 2005 8:48 pm

Ross,
I was just showing some other factors. Excellent job on quantifying the load times per method.

I am a bugbear for maintainability ... too many years in support of systems that were tuned to too fine a level I suppose :D

Ray's idea is what we are using here for the bulk loads. Dump to a flat file, have pre-tuned control files and call the bulk loader. Works like a charm and is Very fast. I did a similar thing at my last site as well for Teradata and Multiloads.

I think this has been a great thread so far, and I think it will improve as more people share their views and knowledge. Keep up the good work.

rleishman · Post by **rleishman** » Tue Sep 27, 2005 8:49 pm

Just benchmarked vmcburney and Ray's suggestion of writing to SEQ stage instead of OCI Load with Manual Load option.

I re-ran all options and found that today is a bit faster than yesterday, but that's benchmarking for you:
* OraBulk - 210 rows/sec
* SEQ File - 220 rows/sec
* OCI Load (Manual) - 210 rows/sec (although I had one outlier at 240/sec)

Conclusion: Sequential file stages (inexplicably) out-perform Oracle stages that do the same job. However I'd still consider OCI Load in Automatic mode on a Local database for sheer simplicity.

Ray's getting down into the nitty gritty writing his own Control file (I expect no less

). I'll give it a go if I can spare 30 mins. Datatyping each column will add another maintenance overhead over and above auto-generated Control files.
These suggestions apply equally to dzdiver's EOT option as well. My earlier estimate of EOT performance is probably a bit overstated as a result. If you can get your data modelling tool to generate the DDL for the EOTs automatically, then I'd say we've got a new winner.

Personally, I keep Parallel Query and Parallel Load in reserve for really big important stuff. I've had bad experiences on a database where too many people are using parallel.

kduke · Post by **kduke** » Tue Sep 27, 2005 9:02 pm

How wide are these rows? That is slow. What about mb/sec?

ian_bennett · Post by **ian_bennett** » Tue Sep 27, 2005 9:37 pm

Is this the case with Transaction Size set to 0?

I would have thought that with that setting then only 1 of the possible 255 transactions would be counted for the whole job.

rleishman wrote:Array Size
....
Performance benefits tend to tail off at higher numbers, and there are some tricky issues with values over 255 relating to the MAXTRANS setting on Oracle tables and indexes.

rleishman · Post by **rleishman** » Wed Sep 28, 2005 12:25 am

kduke wrote:How wide are these rows? That is slow. What about mb/sec?

Sorry, I misled you earlier with the environment. The DS server (our dev box) is actually a virtual machine running Linux over vmware. It makes benchmarking a bit more thought provoking.
The critical speed is not the transformation, but the validation. We are performing around 30 validations serially in a routine, and sending failures down a reject link. Mercifully there are few failures, so almost every row is output to the primary load.
The final output is primarily integers: about 15 of them plus around 50 bytes of Varchar (I'm not at work, so cannot give you exact details).
When I remove the validation, it runs at 1-2000 rows/sec to a sequential file.

ian_bennett wrote:Is this the case with Transaction Size set to 0?

I haven't used transaction size of 0. Correct me if I'm wrong, but it only commits at the end.

If so, then you could experience trouble with 0. It is a bit off-topic, but the guts of it is that if Oracle tries to insert more than 255 transactions in the same block between commits, then it can hang waiting for ITL Enqueue resources. I experienced this using 10g Release 1 on heavily indexed tables and a DB_BLOCK_SIZE of 16Kb. Increasing INITRANS to 255 mitigated the problem somewhat, but it still occurred. I have not established whether it is a bug or not, but the symptom is consistent with the literature.

ian_bennett · Post by **ian_bennett** » Wed Sep 28, 2005 1:21 am

If you set transaction size to 1 then every record is a transaction, hence your problem if you hit Oracle with more than 255 records at once.

If you set transaction size to say, equal to the array size, this should remove the limitation, while still performing a regular commit.

I havent tested this, but I would guess that you would get better performance with fewer commits (up to a point).

rleishman wrote:
kduke wrote:
ian_bennett wrote:Is this the case with Transaction Size set to 0?
I haven't used transaction size of 0. Correct me if I'm wrong, but it only commits at the end.

If so, then you could experience trouble with 0. It is a bit off-topic, but the guts of it is that if Oracle tries to insert more than 255 transactions in the same block between commits, then it can hang waiting for ITL Enqueue resources. I experienced this using 10g Release 1 on heavily indexed tables and a DB_BLOCK_SIZE of 16Kb. Increasing INITRANS to 255 mitigated the problem somewhat, but it still occurred. I have not established whether it is a bug or not, but the symptom is consistent with the literature.

rleishman · Post by **rleishman** » Wed Sep 28, 2005 2:00 am

My original post was incorrectly worded. Where I used Array Size I should have been referring to Transaction Size. Apologies for the confusion.

I actually set both Array Size and Transaction Size to 255, although I see no reason why you couldn't set Array Size larger if you found performance gains. I haven't tested larger Array Sizes.

chulett · Post by **chulett** » Wed Sep 28, 2005 6:49 am

If you haven't done so already, Ross - go back and edit your original post. Many people here don't seem to realize you can do that.

DSXchange

Oracle Performance Tuning - Bulk / Direct / OCI / Updates

Oracle Performance Tuning - Bulk / Direct / OCI / Updates

Re: Oracle Performance Tuning - Bulk / Direct / OCI / Update