slower ODBC Enterprise stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

slower ODBC Enterprise stage

Post by zaino22 »

There is an issue with one job that is reading from DB2 table. Job is very simple, and there is no fancy transformation just straight mapping but DS 8.0 is reading at 10 times faster than DS 8.7. 30 mill records in DS 8.0 are 11,000 rows/sec where as DS 8.7 doesnt go above 1600 rows/sec.
Just so you know another Job on DS 8.7 is running fine with around 10,000 rows/sec against different DB2 table but only selecting 4 columns from half a mill table.

ODBC Enterprise stage is redaing from DB2 table but I have also tested it with ODBC connector stage as well, and makes no difference. Over all, i managed to increase the rows to little over 1600 rows / sec but it is still far lower than DS 8.0. (around over 10,000 rows/sec).

ODBC Stage (select * from db2 table)---->xfm (straight mapping)----->sequential file.

So far tried following with no avail:

1) Checked scratch space, and disk space while job is running and has huge space available. Job is running on four nodes.
2) Tried naming the columns in select statement instead of using select *
3) switched to ODBC Connector stage.
4) Reduced number of columns to 1 columnand speed gets alot faster when there are fewer columns. (over 30,000 rows/sec)
5) Changed the value for "fetch array size" to 2000 in ODBC Enterprise stage.
6) increase Default transport block size to 300,000 and Latency Coefficient to 3.
7) odbc.ini file has same entries, and pointing to exact same object.
8- ULIMIT is unlimited for all except nofiles(descriptors) 131072
9) changesd NLS to UTF-8

what can I do to figure out bottle neck?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Compare all the properties in the two ODBC Connector stages, looking for ANY differences.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Post by zaino22 »

Hi Ray,
Correction to #7 [odbc.ini file has same entries, and pointing to exact same object.]
Driver is different; DS 8.0 had VMdb222.so and DS 8.7 has /VMdb225.so

I searched DSX and found this link that talks about slowness in read/write but since guy posted quesiton was using ODBC connector I did not have same options (partition read etc) for stage as he did. I changed ODBC Enterprise stage to ODBC connector speed increased upto 7500 rows /sec but still not close to 10k rows that DS 8.0 was fetching.

It seems like since ODBC Ent. is a sequential stage, one node could not handle huge number of columns. It is faster with one column extract and performance goes down as we add more columns.

Any suggestions?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

zaino22 wrote: It is faster with one column extract and performance goes down as we add more columns.
That makes complete sense, since elapsed time will be directly proportional to total volume of data needing to be processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Post by zaino22 »

I put the ODBC connector to 8 nodes and it gives me over 11k but I have to play around with settings in ODBC connector to get more read out.
Still ODBC enterprise stage unable to match the previous output is puzzling me.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

G'day Zaino,
Comparing two jobs that read different tables with different criteria will only lead to frustration. The two tables are different sizes, possible different predicates and you are transferring different amounts of data. So comparing the speed of your problem job to another isn't a fair comparison ... Apples and Pomegranates come to mind.

Onto your actual problem. Some things that may impact performance between different systems. These may or may not apply in your case.

Is the network path for the two jobs the same. Is the Hardware the jobs are running on the same with the same load? One site I was at was trialling 8.1 on a virtual box and was surprised it didn't run as fast as the other system.

If the 8.0 and 8.7 engines are not on the same box, are they on the same network path to the database? Using the same switches, jobs running at comparable times, DB2 caches cleared between runs. are the NIC's the same capacity (you can't expect the same transfer with a 1GB NIC you would from a 10GB NIC).

Networks, I have found, are the bane of ETL. Supposedly identical networks can perform very differently because of external load (and I have found many a network tech deny there's an issue).

I am not ruling out there is a DataStage issue with the enterprise stage. I think I remember reading or hearing that ODBC had been revamped sometime between 8.0 and 8.7.

I note that you are running different versions of the ODBC driver.

It looks like you are possible comparing apples and apples, but seems to be Granny smiths vs Red Delicious. Nearly the same but a world of difference in the taste.

Is it feasible to go to the connector stage? My understanding it is preferred to use connectors than enterprise in the later editions.

Good luck with your tweaking.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Post by zaino22 »

Thank you Aartlett for taking time to discuss potential issues about bottleneck and putting that in words.

We noticed job that was migrated to 8.7 from 8.0 was taking 6 hrs as oppose to 2 hr extracting ~35 Mil records. After trying few points from my first note I posted question here on the forum. When Ray pointed out to check for difference in both Jons I looked again and found driver is different.

I managed to pull higher number of rows (11k) with ODBC connector stage running 8 nodes (previously ODBC Enterprise stage running 4 nodes) but I wanted to know if I have left any stone unturned in search of solution. It is suppose to run faster or same not slower.
I guessed it is either node or driver but I know that there are more nodes and better swap and scratch space than 8.0 so it may be deiver? Never thouht of network card since there is no comparable table with same volume. I will find out network commection tomorrow and share the findings. port to DB2 is same.
The idea was to get off DS 8.0 so they won't be running at the same time at all.
I often get right direction from DSX so thank you contributors.
    zaino22
    Premium Member
    Premium Member
    Posts: 81
    Joined: Thu Mar 22, 2007 7:10 pm

    Post by zaino22 »

    I asked and it seems like connection to the boxes are same. I could not verify on my own.
    Tested query on command line (from same unix box to DB2 without using Datastge) and respones time for ~35mil records were about 53 min. So I am guessing connetion/network is correct, and I can match this with 8 nodes + ODBC Connector but not 4 nodes ODBC connector or 4 Nodes ODBC Enterprise stage.

    I wonder if there is another environment value (buffer?) that had to be changed to hold number of columns etc. I will fiddle around and if nothing found I will close this query.
    kwwilliams
    Participant
    Posts: 437
    Joined: Fri Oct 21, 2005 10:00 pm

    Post by kwwilliams »

    Post your .odbc.ini entry - mask your host name information.
    zaino22
    Premium Member
    Premium Member
    Posts: 81
    Joined: Thu Mar 22, 2007 7:10 pm

    Post by zaino22 »

    I see two entries in .odbc.ini file. one is [DB2 Wire Protocol] and the other one created by Admin called [CDXA].
    Is first one IBM template ? I am providing both in case first is relevant.


    Here is the info:

    uvodbc.config:
    ------------------
    [ODBC DATA SOURCES]
    <localuv>
    DBMSTYPE = UNIVERSE
    network = TCP/IP
    service = uvserver
    host = 127.0.0.1
    <CDXA>
    DBMSTYPE = ODBC

    .odbc.ini
    -----------
    [DB2 Wire Protocol]
    Driver=/IBM/InformationServer/Server/branded_odbc/lib/VMdb225.so
    Description=DataDirect DB2 Wire Protocol Driver
    AddStringToCreateTable=
    AlternateID=
    Collection=OS/390 and AS/400 (Remove for DB2 UDB)
    Database=DB2 UDB (Remove for OS/390 and AS/400)
    DynamicSections=100
    GrantAuthid=PUBLIC
    GrantExecute=1
    IpAddress=DB2 server host
    IsolationLevel=CURSOR_STABILITY
    Location=OS/390 and AS/400 (Remove for DB2 UDB)
    LogonID=
    Password=
    Package=DB2 package name
    PackageOwner=
    TcpPort=DB2 server port
    WithHold=1

    [CDXA]
    QEWSD=41102
    Driver=/IBM/InformationServer/Server/branded_odbc/lib/VMdb225.so
    Description=DataDirect DB2 Wire Protocol Driver
    AddStringToCreateTable=
    AlternateID=
    Collection=
    Database=CDXA
    DynamicSections=100
    GrantAuthid=PUBLIC
    GrantExecute=1
    IpAddress=<iCauseHavoc> *changed to protect innocent corp
    IsolationLevel=CURSOR_STABILITY
    Location=
    LogonID=
    Password=
    Package=CDXA
    PackageOwner=
    TcpPort=60098
    WithHold=1

    ----------------------------------
    ray.wurlod
    Participant
    Posts: 54607
    Joined: Wed Oct 23, 2002 10:52 pm
    Location: Sydney, Australia
    Contact:

    Post by ray.wurlod »

    Yes, the first is the prototype for DB2 Wire Protocol connections. This is in .odbc.ini when the product is installed. The other entry was added at, and is specific to, your site.

    The entry in uvodbc.config simply makes the DSN visible and identifies it as "true" ODBC or "UniVerse" using ODBC protocols.
    IBM Software Services Group
    Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
    zaino22
    Premium Member
    Premium Member
    Posts: 81
    Joined: Thu Mar 22, 2007 7:10 pm

    Post by zaino22 »

    IBM suggested to use Partition with min and max values, which only hit around 6K rows with 4 nodes and did not reach the legacy DS version throughput 10,000 rows/sec. Previously, I had hit 7500 rows/sec with partition Modulus option, and later on with 8 nodes, I hit around 13k rows/sec. Recently IBM suggested the same, increase number of nodes, which I had already tried so nothing new.

    Still couldn't figure out why new version could not match (let alone beat) legacy version on same number of nodes.

    Thank you everyone who pitched in their ideas and tried to be helpful.
    Post Reply