Rest 50% Success Rate

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Rest 50% Success Rate

Post by pbttbis »

Hi,

We are having an issue in the REST step (Hierarchy Stage) of an ISD job
where the first REST call is always successful, followed by the next
call failing, followed by the next call successful, followed by the
next failing and so on. It seems the failure seems to "reset" whatever
the issue is so that the next call is successful.

We have confirmed that the every other failure is not being received at all by the system being called in the REST step.

Added TRACE logging to the hierarchical stage and the first lines that differs in the XMLStage_REST_0.log when comparing a success and failure is:

2016-01-26 18:40:02,959 Debug [REST] [] # of XML Parser transitions in document: 0
2016-01-26 18:40:02,959 Debug [REST] [] Number of XLXP Events consumed in document: 0

Anyone experienced something like this or have some ideas?

Have logged a PMR with IBM

Thanks,

Shaun
PBT TBIS Consultant
eostic
Premium Member
Premium Member
Posts: 3835
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hard to say....can you share with us the topology of the overall ISD Job? Is there a possibility that your Hierarchical Stage in the Job has more than one input link?

Are there any Stages in the Job that have multiple input links? Any Joins? Are there any additional passive sources in the Job?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/">Open IGC is Here!</a>
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

ISD_INPUT
-> Transformer_1 (had split to ODBC load but currently removed for testing)
-> CopyStage_1
-> CopyStage_2
-> Transformer_2
-> Tranfomer_3 (in place to work around Hierachial Stage micro seconds truncation bug)
-> Hierarchical Stage (has the REST Step)
-> Transformer_3 (in place to work around Hierachial Stage micro seconds truncation bug)
-> CopyStage_3
-> Transformer_4 (had split to ODBC load but currently removed for testing)
-> Funnel (Inputs: Tranformer_2 and Transformer_4)
-> CopyStage_4
ISD_OUTPUT

Input Message Flow:

apache httpd -> Datastage -> apache tomcat loadbalancer -> apache tomcat worker

With more testing we have found that submitting requests to the DataStage URL via the WizTools RESTClient that we get success every time.

We have confirmed that for the failures that the connection is not even received by the loadbalancer.
PBT TBIS Consultant
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

Found this link that seems to be what I am experiencing:

http://www-01.ibm.com/support/knowledge ... match.html
PBT TBIS Consultant
eostic
Premium Member
Premium Member
Posts: 3835
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...well, that's where I was going with my questions above, but it doesn't sound like you have that topology...... the topology in that link that is problematic, and which I was hinting at, is one that has more than just ISD as the "driving" link in the Job. Meaning --- you can't have a static source (like an ODBC Connector) within an ISD Job. If there is a Join, it would read the Join source once...and do the join, and then never know if it should read it again --- it creates a sort of "real time quandary".

Anyway...it doesn't sound like you are doing that.

The key must be in the tooling for your client..What is different about the test tool from your real system? Why does it work with that client all the time? It isn't likely that ISD has any idea of how/who is communicating with it...perhaps the two do something different in their payloads or calling mechanism?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/">Open IGC is Here!</a>
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

So the REST step performing the POST would not be considered a reference lookup and create a "real time quandry" situation?

Yeah I am working with the apache admin trying to figure out how the connection config differs between using the test tool as opposed to the actual system.

Is there anyway I can

1) check that the datastage connection setup in the REST step is in fact closed after a response is recieved
2) from the datastage server check that the connection received from the apache server is closed after the ISD_Output responds
PBT TBIS Consultant
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

Some more investigating with wireshark and tcpdumps we see the follows:

tomcat (8080) sends a fin, ack, datastage(51759) sends an ack closing the TCP connection

Datastage does not send a FIN along with its ACK, so when Datastage sends another REST request from port 51759 which tomcat knows is closed, therefore sends two RST reset packets in response.

Is there anyway in DataStage I can call a connection.close() or some setting to force a connection closed after it has been made in the REST step?
PBT TBIS Consultant
eostic
Premium Member
Premium Member
Posts: 3835
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Not that I've ever seen, but anything is possible.

Question --- does the same set of REST calls, using the Hierarchical Stage, work if you have a regular batch Job, with a RowGen or other type of source feeding in the rows?

If so, then it may not be ISD per se, that is causing the problem, but the end-of-wave markers that are sent.

Of course, if the regular batch Job also dies, then something else is up and you can eliminate ISD from the equation.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/">Open IGC is Here!</a>
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

Yeah as a batch job reading the URL encoded data I can submit the request successfully over and over again.
PBT TBIS Consultant
eostic
Premium Member
Premium Member
Posts: 3835
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It would be interesting, if you have the time, to put in an end-of-wave Operator upstream in your batch Job.... set a counter and just have the end of wave take action for every row. It might die the same, proving that it is end of wave [not that this solves your problem].

That's my only guess. ISD implies and end-of-wave between requests...this is one of the ways it isolates one request from another, even if both are flowing thru the same Job.

I don't know if there are any settings in the Stage that would help.... You "might" find that it works in a Server Job, as some of the underpinnings are different there, but the Hierarchical Stage has limited support in Server Jobs, allowing only one input link.... and depending on the complexity of your Job or the inclusion of QualityStage functions, that could be difficult or impossible to build in Server.

Has your support provider had any thoughts?

The only other solution I can think of, without knowning what is happening in the internal HTTP plumbing, would be to write your own REST client using the Java Integration Stage and see if that alleviates the problem --- when you are entirely controlling the connection.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3835
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It would be interesting, if you have the time, to put in an end-of-wave Operator upstream in your batch Job.... set a counter and just have the end of wave take action for every row. It might die the same, proving that it is end of wave [not that this solves your problem].

That's my only guess. ISD implies and end-of-wave between requests...this is one of the ways it isolates one request from another, even if both are flowing thru the same Job.

I don't know if there are any settings in the Stage that would help.... You "might" find that it works in a Server Job, as some of the underpinnings are different there, but the Hierarchical Stage has limited support in Server Jobs, allowing only one input link.... and depending on the complexity of your Job or the inclusion of QualityStage functions, that could be difficult or impossible to build in Server.

Has your support provider had any thoughts?

The only other solution I can think of, without knowning what is happening in the internal HTTP plumbing, would be to write your own REST client using the Java Integration Stage and see if that alleviates the problem --- when you are entirely controlling the connection.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/">Open IGC is Here!</a>
pbttbis
Premium Member
Premium Member
Posts: 36
Joined: Thu Dec 11, 2014 3:30 am

Post by pbttbis »

Okay we have changed the plumbing a bit and the architecture now is:

apache httpd -> Datastage -> HTTP Forwarder -> apache tomcat loadbalancer -> apache tomcat worker

Seems in this configuration Apache is now handling the keep alives and re-opening of connections to tomcat. Do not understand it 100%, but its wrorking and the client is happy with the architecture.

Going to mark this thread as resolved. Thanks for the help Ernie.
PBT TBIS Consultant
Post Reply