JDBC Connector Performance Tuning

satheesh_color · Post by **satheesh_color** » Wed Jan 18, 2017 3:47 am

Hi All,

We are using JDBC Connector to connect to the remote Database using oracle thin client using Type 4 connection.

When we try to extract the data from the Database....the fetch or rows/sec is very low(800-1000 rows/sec).

Properties:
FetchSize:2000
HeapSize:256

Kindly let me know your thoughts to increase the performance of the data extraction.

Thanks,
Satheesh.R

chulett · Post by **chulett** » Wed Jan 18, 2017 7:49 am

You need to determine the bottleneck. Rather than a magic setting in the connector it could be the source database, complexity of the source SQL, network issues or just how long it takes to write to the target. Eliminate each one by one.

PaulVL · Post by **PaulVL** » Wed Jan 18, 2017 2:42 pm

Also, are you doing a partitioned read?

Rows per second is misleading if you do not indicate what your row width is. 800 rows per second when you have 100 bytes per row is terrible. But if you have 100K bytes per row then 800 is pretty darn good.

satheesh_color · Post by **satheesh_color** » Thu Feb 16, 2017 5:18 am

Hi paul,

Can we use partition read for non-partitioned tables.When I try to read data on partitioned read, the count shows a huge variation and it fetches much faster 25k rows/sec. But..i am really worried about the huge count mismatch.

Kindly let me know your thoughts on the same.

Regards,
Satheesh.R

leandrohmvieira · Post by **leandrohmvieira** » Thu Feb 16, 2017 6:18 am

Hello Satheesh, you can use partition reads on non-partitioned tables for sure, there are partions methods like round-robin which divides the table evenly.

You just need to be aware of your ETL design, some stages need specific partition types to work, so if you have a complex ETL design, you data may need be repartitioned, leading to new bottlenecks.

But if its a simple source->target design, it will be fine. Just make a validation/count on target based on some key after the operation to check.

satheesh_color · Post by **satheesh_color** » Thu Feb 16, 2017 7:46 pm

Hi,

We were using simple extraction jobs from JDBC to a Dataset(Using Hash Partitioning) then it will be used in another jobs for join/lookup. Still i could see the count mismatch(huge difference) when i try to run the job with Partitioned read=YES.

Appreciate for your help!!!

Regards,
Satheesh.R

PaulVL · Post by **PaulVL** » Fri Feb 17, 2017 8:22 am

What is the count mismatch you are talking about? Are you talking about the end result in the quantity of rows obtained? Or simply the rate at which you are extracting from Oracle?

satheesh_color · Post by **satheesh_color** » Sat Feb 18, 2017 12:07 am

Hi Paul,

Count mismatch is i could see the huge difference in extracting as well as Dataset rows.

Eg: My source table has the count of 1000, when i try to run with 4 nodes with partition read enabled...the extraction count exceeds more than 5000 rows.

Regards,
Satheesh.R

Mike · Post by **Mike** » Sat Feb 18, 2017 9:17 am

Check with your support provider to determine if there are any known issues with partitioned read in the JDBC Connector.

Add $CC_MSG_LVL to your job parameters and set it to 1, 2, or 3 to enable some connector tracing to be recorded in the job log. I usually start with a setting of 2. You may have to add it as a user defined project environment variable if you have not done so previously (leave the default empty so as not to generate excessive logging across your entire project).

In the job log you will see the actual query that was passed to the database. A partitioned read usually injects some additional logic in the WHERE clause. If you have anything more complex than a single table SELECT, the partitioned read may not work properly. You can determine that by evaluating the generated query.

Mike

PaulVL · Post by **PaulVL** » Mon Feb 20, 2017 9:33 am

Start to debug in a phased approach.

chop your job up into pieces.

Oracle Connector -> dataset
Oracle Connector -> sequential file

With 1000 rows in your source table, I would have expected maybe 4000 rows on your output if your partitioning was entire, but ... 5000ish?!? that is odd.

When faced with an issue, I like to break it down into it's smallest components. That way you remove "noise" from the equation. If it's your Oracle read that you suspect is wrong, then just do a read in your job, no other transforms, etc...