slower ODBC Enterprise stage

zaino22 · Post by **zaino22** » Tue Jan 22, 2013 2:53 pm

There is an issue with one job that is reading from DB2 table. Job is very simple, and there is no fancy transformation just straight mapping but DS 8.0 is reading at 10 times faster than DS 8.7. 30 mill records in DS 8.0 are 11,000 rows/sec where as DS 8.7 doesnt go above 1600 rows/sec.
Just so you know another Job on DS 8.7 is running fine with around 10,000 rows/sec against different DB2 table but only selecting 4 columns from half a mill table.

ODBC Enterprise stage is redaing from DB2 table but I have also tested it with ODBC connector stage as well, and makes no difference. Over all, i managed to increase the rows to little over 1600 rows / sec but it is still far lower than DS 8.0. (around over 10,000 rows/sec).

ODBC Stage (select * from db2 table)---->xfm (straight mapping)----->sequential file.

So far tried following with no avail:

1) Checked scratch space, and disk space while job is running and has huge space available. Job is running on four nodes.
2) Tried naming the columns in select statement instead of using select *
3) switched to ODBC Connector stage.
4) Reduced number of columns to 1 columnand speed gets alot faster when there are fewer columns. (over 30,000 rows/sec)
5) Changed the value for "fetch array size" to 2000 in ODBC Enterprise stage.
6) increase Default transport block size to 300,000 and Latency Coefficient to 3.
7) odbc.ini file has same entries, and pointing to exact same object.
8- ULIMIT is unlimited for all except nofiles(descriptors) 131072
9) changesd NLS to UTF-8

what can I do to figure out bottle neck?

ray.wurlod · Post by **ray.wurlod** » Tue Jan 22, 2013 3:56 pm

Compare all the properties in the two ODBC Connector stages, looking for ANY differences.

zaino22 · Post by **zaino22** » Wed Jan 23, 2013 10:05 am

Hi Ray,
Correction to #7 [odbc.ini file has same entries, and pointing to exact same object.]
Driver is different; DS 8.0 had VMdb222.so and DS 8.7 has /VMdb225.so

I searched DSX and found this link that talks about slowness in read/write but since guy posted quesiton was using ODBC connector I did not have same options (partition read etc) for stage as he did. I changed ODBC Enterprise stage to ODBC connector speed increased upto 7500 rows /sec but still not close to 10k rows that DS 8.0 was fetching.

It seems like since ODBC Ent. is a sequential stage, one node could not handle huge number of columns. It is faster with one column extract and performance goes down as we add more columns.

Any suggestions?

ray.wurlod · Post by **ray.wurlod** » Wed Jan 23, 2013 1:19 pm

zaino22 wrote: It is faster with one column extract and performance goes down as we add more columns.

That makes complete sense, since elapsed time will be directly proportional to total volume of data needing to be processed.

zaino22 · Post by **zaino22** » Wed Jan 23, 2013 2:45 pm

I put the ODBC connector to 8 nodes and it gives me over 11k but I have to play around with settings in ODBC connector to get more read out.
Still ODBC enterprise stage unable to match the previous output is puzzling me.

aartlett · Post by **aartlett** » Wed Jan 23, 2013 6:10 pm

G'day Zaino,
Comparing two jobs that read different tables with different criteria will only lead to frustration. The two tables are different sizes, possible different predicates and you are transferring different amounts of data. So comparing the speed of your problem job to another isn't a fair comparison ... Apples and Pomegranates come to mind.

Onto your actual problem. Some things that may impact performance between different systems. These may or may not apply in your case.

Is the network path for the two jobs the same. Is the Hardware the jobs are running on the same with the same load? One site I was at was trialling 8.1 on a virtual box and was surprised it didn't run as fast as the other system.

If the 8.0 and 8.7 engines are not on the same box, are they on the same network path to the database? Using the same switches, jobs running at comparable times, DB2 caches cleared between runs. are the NIC's the same capacity (you can't expect the same transfer with a 1GB NIC you would from a 10GB NIC).

Networks, I have found, are the bane of ETL. Supposedly identical networks can perform very differently because of external load (and I have found many a network tech deny there's an issue).

I am not ruling out there is a DataStage issue with the enterprise stage. I think I remember reading or hearing that ODBC had been revamped sometime between 8.0 and 8.7.

I note that you are running different versions of the ODBC driver.

It looks like you are possible comparing apples and apples, but seems to be Granny smiths vs Red Delicious. Nearly the same but a world of difference in the taste.

Is it feasible to go to the connector stage? My understanding it is preferred to use connectors than enterprise in the later editions.

Good luck with your tweaking.

zaino22 · Post by **zaino22** » Wed Jan 23, 2013 7:06 pm

Thank you Aartlett for taking time to discuss potential issues about bottleneck and putting that in words.

We noticed job that was migrated to 8.7 from 8.0 was taking 6 hrs as oppose to 2 hr extracting ~35 Mil records. After trying few points from my first note I posted question here on the forum. When Ray pointed out to check for difference in both Jons I looked again and found driver is different.

I managed to pull higher number of rows (11k) with ODBC connector stage running 8 nodes (previously ODBC Enterprise stage running 4 nodes) but I wanted to know if I have left any stone unturned in search of solution. It is suppose to run faster or same not slower.
I guessed it is either node or driver but I know that there are more nodes and better swap and scratch space than 8.0 so it may be deiver? Never thouht of network card since there is no comparable table with same volume. I will find out network commection tomorrow and share the findings. port to DB2 is same.
The idea was to get off DS 8.0 so they won't be running at the same time at all.
I often get right direction from DSX so thank you contributors.

zaino22 · Post by **zaino22** » Thu Jan 24, 2013 11:32 am

I asked and it seems like connection to the boxes are same. I could not verify on my own.
Tested query on command line (from same unix box to DB2 without using Datastge) and respones time for ~35mil records were about 53 min. So I am guessing connetion/network is correct, and I can match this with 8 nodes + ODBC Connector but not 4 nodes ODBC connector or 4 Nodes ODBC Enterprise stage.

I wonder if there is another environment value (buffer?) that had to be changed to hold number of columns etc. I will fiddle around and if nothing found I will close this query.

kwwilliams · Post by **kwwilliams** » Thu Jan 24, 2013 3:37 pm

Post your .odbc.ini entry - mask your host name information.

zaino22 · Post by **zaino22** » Fri Jan 25, 2013 9:03 am

I see two entries in .odbc.ini file. one is [DB2 Wire Protocol] and the other one created by Admin called [CDXA].
Is first one IBM template ? I am providing both in case first is relevant.

Here is the info:

uvodbc.config:
------------------
[ODBC DATA SOURCES]
<localuv>
DBMSTYPE = UNIVERSE
network = TCP/IP
service = uvserver
host = 127.0.0.1
<CDXA>
DBMSTYPE = ODBC

.odbc.ini
-----------
[DB2 Wire Protocol]
Driver=/IBM/InformationServer/Server/branded_odbc/lib/VMdb225.so
Description=DataDirect DB2 Wire Protocol Driver
AddStringToCreateTable=
AlternateID=
Collection=OS/390 and AS/400 (Remove for DB2 UDB)
Database=DB2 UDB (Remove for OS/390 and AS/400)
DynamicSections=100
GrantAuthid=PUBLIC
GrantExecute=1
IpAddress=DB2 server host
IsolationLevel=CURSOR_STABILITY
Location=OS/390 and AS/400 (Remove for DB2 UDB)
LogonID=
Password=
Package=DB2 package name
PackageOwner=
TcpPort=DB2 server port
WithHold=1

[CDXA]
QEWSD=41102
Driver=/IBM/InformationServer/Server/branded_odbc/lib/VMdb225.so
Description=DataDirect DB2 Wire Protocol Driver
AddStringToCreateTable=
AlternateID=
Collection=
Database=CDXA
DynamicSections=100
GrantAuthid=PUBLIC
GrantExecute=1
IpAddress=<iCauseHavoc> *changed to protect innocent corp
IsolationLevel=CURSOR_STABILITY
Location=
LogonID=
Password=
Package=CDXA
PackageOwner=
TcpPort=60098
WithHold=1

----------------------------------

ray.wurlod · Post by **ray.wurlod** » Fri Jan 25, 2013 4:01 pm

Yes, the first is the prototype for DB2 Wire Protocol connections. This is in .odbc.ini when the product is installed. The other entry was added at, and is specific to, your site.

The entry in uvodbc.config simply makes the DSN visible and identifies it as "true" ODBC or "UniVerse" using ODBC protocols.

zaino22 · Post by **zaino22** » Sun Mar 24, 2013 6:02 pm

IBM suggested to use Partition with min and max values, which only hit around 6K rows with 4 nodes and did not reach the legacy DS version throughput 10,000 rows/sec. Previously, I had hit 7500 rows/sec with partition Modulus option, and later on with 8 nodes, I hit around 13k rows/sec. Recently IBM suggested the same, increase number of nodes, which I had already tried so nothing new.

Still couldn't figure out why new version could not match (let alone beat) legacy version on same number of nodes.

Thank you everyone who pitched in their ideas and tried to be helpful.