Page 1 of 1

Array Size, Record Count, Commit Frequency, Bulk Insert

Posted: Wed May 09, 2018 11:54 am
by jreddy
Looking for some guidance on how we can set/use the transaction handling options in ODBC Connector (or other SQL Server stages) to get the best performance when inserting/updating data to SQL Server Database.

For a job that inserts 50K rows for example, I have set the Array Size and Record Count as 2000 in the SQL Server Enterprise stage for the INSERT, but have seen that on the DB side, I see 50k insert statements issued

Tried changing array size and record count to 20K each, but still see the same on the DB side.

What option(s) could be set so we can reduce the number of insert statements. I was going to try the SQL Server Bulk stage and ODBC connector stage and try the same, but wanted to see if anyone was able to gain performance improvements by playing with these options and also commit frequencies, for example..

Any guidance is appreciated, thanks

Posted: Wed May 09, 2018 12:01 pm
by chulett
None of those settings will affect the number of inserts issued. If you've got 50k rows to process, 50k insert statements will be issued. Array Size controls how many are "bundled together" to be sent to the database at any given time. And Record Count, doesn't that control how often a commit is performed?

Posted: Wed May 09, 2018 12:07 pm
by jreddy
Thank you Craig.

Our DBA has pointed me to these 50k inserts getting issued 1 by 1 and has asked me to see if there is a way to group them for better efficiencies, so it doesn't hit the database that many times.. is that something we can do with a bulk load?

Posted: Wed May 09, 2018 1:03 pm
by chulett
Not sure why there would be an issue with that number of inserts. Sure they get processed 'one by one' but that's not a volume where I would think it warranted to switch to bulk loads. Are they taking an inordinate amount of time and that's why the DBA is involved?

Bulk inserts are a different beast. Not familiar with SQL Server and all of its nuances but have dealt with this in Oracle. They use a different API, bypass a lot of the things a "conventional" insert needs to deal with and there's no longer a concept of "commits". For Oracle, you cannot have any indexes on the table. And then it all processes successfully or it fails and when there's a bulk load failure it can leave your table in a bit of a compromised state. That's why we save them for Loads of Unusual Size - say for example when our normal daily load might be 1 to 1.5 million (which we still load conventionally) expands up to 100 to 150 million like it does once a year.

And from the Oracle side I could also write a PL/SQL procedure to generate and bulk collect the inserts and let the database handle them in large chunks that way. Been a heck of a long time since I've had a situation where that looked like that would be a viable solution for. Never mind it's not an ETL solution, more of a PITA solution.

Not sure how much of that helps. I'm just wondering if you are trying to solve something that really isn't a problem, per se. :(

Posted: Thu May 10, 2018 5:10 am
by qt_ky
Sounds quite normal.

Posted: Fri May 11, 2018 11:57 am
by jreddy
Thank you Craig
Yes, we are having performance issues with the load jobs and hence we are looking into every possible way to tune it. The whole process up until the last stage where insert happens breezes through and then insert takes anywhere between 2 mins to 11 mins randomly ..

Posted: Fri May 11, 2018 4:25 pm
by chulett
I would hazard a guess that the load on the database, whatever other processes are running on it or just the server itself, are what are driving your variances. That and probably volumes since I don't image each run is always 50K records. Another factor could simply be the number of indexes on the table. Or network congestion. Lots of factors could be at play here.

Just from personal experience I would caution against going down the bulk load path for such a small number of rows considering the (mostly single digit) load times you are seeing. I don't believe the level of complexity it would add would be worth whatever performance gains you may or may not see. All IMHO, of course.

Posted: Tue May 15, 2018 10:05 am
by UCDI
have you done stats and index on the target? Inserts can go long if the stats are a mess, or if the table isnt indexed well for the job.

bulk works well too, if you go that route. Hardware and systems vary so its hard to say where to do a cutoff of inserts vs bulk mode. What seems few to a big system can be overwhelming to a tiny system. Try both, see what you get.

Posted: Sun May 20, 2018 3:26 pm
by rschirm
I would try the following.

Calculate what the width is for each row. (so add up all the lengths of the columns) Take that value and divide it into 64,000. Round the value down to the next whole number. Use this number for the Array Size. Make the Record Count to be a multiple of the Array Size so that number is up around 5000.



Please post back with the stats of this change.