ODBC connection problem in "Always On" WS

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

ODBC connection problem in "Always On" WS

Post by MT »

Hi,

I have a always on WebService running which selects result rows from a DB2 table via an ODBC stage. The ODBC stage is used because I need a multi-row lookup.
So far so good.

If the ODBC connection terminates unexpectantly - i.e. through a force command or database shutdown etc. the WebService does not receive an error. The WebService logs a warning - like following:

Code: Select all

...SQLSTATE=40003, DBMS.CODE=-30081
[DataStage][SQL Client][ODBC][IBM][CLI Driver] SQL30081N  A communication error has been detected.  Communication protocol being used: "TCP/IP".  Communication API being used: "SOCKETS".  Location where the error was detected: "10.131.131.105".  Communication function detecting the error: "recv".  Protocol specific error code(s): "*", "*", "0".  SQLSTATE=08001
 
or

...
SQLSTATE=40003, DBMS.CODE=-99999
[DataStage][SQL Client][ODBC][IBM][CLI Driver] CLI0108E  Communication link failure. SQLSTATE=40003
The job calling (consuming) the WebService via a WS Transformer does not get any error or warning but does not receive any result data - so an empty row is returned.

My goal is to make this process a little more "bullet proof". I want this process to run successfully (after DB2 comes up again) without manual interaction.

I am looking for ideas how to do it.
One might be to set the "Time to live" for the WebService but I still get an empty result set until the service is restarted and I would - at least - return some kind of error message to the caller. So the second question is how can I track this error scenario?

I have experimented a little bit with DSGetLinkInfo or DSGetStageInfo without success.

Any ideas are appreciated.

kind reagrds
Michael
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I've seen this before. Too bad the job doesn't abort. One solution is to lengthen the timeout from your rdbms, and another is to make adjustments to the maximum runtime, so that the job recycles more often.

A solution that I've thought about but have never tested is to have a "heartbeat" type lookup earlier in the job that always returns a row....and if for some reason it does not (value is null, etc.) then return a special message to the calling web service client, or simply find a way to force the job to abort, or both.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

Hi Ernie,
eostic wrote:I've seen this before. Too bad the job doesn't abort. One solution is to lengthen the timeout from your rdbms, and another is to make adjustments to the maximum runtime, so that the job recycles more often.
I do not see the point what any timeout could do. Waiting is no solution in my eyes - nothing will change so.
I guess with "maximum runtime" you refer to the "Time to live option" of WS don't you? This will in my eyes impose quite a lot of overhead due to starting several WebServices - let us say every 5 minutes - just for the (hopefully) relatively unlikely case of a db shutdown.....

eostic wrote: A solution that I've thought about but have never tested is to have a "heartbeat" type lookup earlier in the job that always returns a row....and if for some reason it does not (value is null, etc.) then return a special message to the calling web service client, or simply find a way to force the job to abort, or both.
Ernie
This might be an idea ...
The additional lokkup is a overhead but I could check wether it returns a row and if not call something like UtilityAborttoLog which will cause the WS to abort and resstart itself. I know this utility was not made for this kind of scenario but I lack of a better solution so far....

kind regards
Michael
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I was referring to the "programmatic" reason for losing the connection....ie...the dba of the database where you are doing the lookup establishes a timeout rule that disconnects all threads without activity after 15 minutes (I see this all the time).... in that case, set the max time to live to (say) 14 minutes, so that there won't be a problem.

In your case, it sounds like the database is not reliable, or could go down unexpectedly. In that case, you need your "own" detection mechanism like in the second scenario (or have your database folks implement an automated HA scenario... ; ) ).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I think the lookup is the simplest, and also the most consistent (it's doing exactly what your "real" lookup is doing), but perhaps there are other ways to more easily detect that the database is up and available.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

eostic wrote:I think the lookup is the simplest, and also the most consistent (it's doing exactly what your "real" lookup is doing), but perhaps there are other ways to more easily detect that the database is up and available.

Ernie
Hi Ernie,

that is what I tried to implement - until I stepped over another strange behaviour....

Having two lookups - one to test the ODBC connection and the real one - I got warning messages saying the connection is not available etc. (like the one described in my earlier post) - but somehow the connection test lookup DID deliver data which made my logic say that everything is ok.
By monitoring my DB2 database I saw that the connection was really gone.
I wrote the lookup details into a file - just for debug reasons - and it had some of the data (from an earlier successful run) !
My always on WS (SERVER job) seems to cache data - while I thought it will do a sparse lookup.

I think this is really strange.
Any ideas how to switch this caching off?

kind regards
Michael
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

No idea. Bummer. You may have to resort to something else, like trying an update to that server, or a custom sql that issues a command that will provide at least some kind of return code.....of course, checking that return code could be difficult (this would be much simpler in Server).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

eostic wrote:No idea. Bummer. You may have to resort to something else, like trying an update to that server, or a custom sql that issues a command that will provide at least some kind of return code.....of course, checking that return code could be difficult (this would be much simpler in Server).

Ernie
Hi,
well it is Server!
I have already thought about an other solution but I hoped that someone could explain this behaviour to me that I can understand what is going on.....

regards
Michael
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you involved your official support provider yet?
-craig

"You can never have too many knives" -- Logan Nine Fingers
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

chulett wrote:Have you involved your official support provider yet?
Hi chulett,

not yet - I have another support case running for 2 month now...
It is sometimes quicker to find an alternative solution :)

I thought the best way to get some background is here....

kind regards
Michael
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sometimes, yes, but not always. Good to check here first, but when The Man says "bummer, no idea" you fall squarely back into the lap of support. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

On this one, I tend to agree with Michael. There could be 1000 variables that affect something like this, not all of them on the Information Server side of things. Best to find a cool way that nails it and works in 99.99% of all situations.
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply