DB2 Extract into Web service

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
Nicole
Premium Member
Premium Member
Posts: 93
Joined: Mon Oct 22, 2007 12:55 pm
Location: Albany
Contact:

DB2 Extract into Web service

Post by Nicole »

Hello everyone - just trying to figure out where the bottleneck is in a Datastage job - when using the DB2 API stage to extract data (cannot use enterprise at this time) - and then passing the data to the web service. As the data is extracted is it passed to the web service immediately or does it extract all data and then pass to web service? Is there a way to speed this type of job up? Any advice appreciated.

Thanks,

Nicole
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...it depends.....how are you calling your Web Service? Can you tell us a bit more about the topology of the Job? I assume that DB2 API Stage is the source.......what is it doing for retrieval (complex join, simple SQL, Stored Procedure....Aggregation? ...one row retrieved or thousands of rows, etc.) what happens downstream? Exactly what stage types? WSTransformer, or WSClient? What is the Web Service at the other end expecting?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

hmm.. ...for further qualification, your note says you are "passing data to a web service", but the note is in the RTI Services topic --- do you mean that this job "is" a web service and you are passing data to RTIOutput Stage? ....then it's a slightly different discussion.....

...and if that's the case, tell us more about the Job....does it have RTIInput, or just DB2 as the sole source? What are you passing into the service? DataStage Job Parameters?

....if so, there might be several things we can do to speed things up.....and in most cases, dramatically (100:1 or better in many cases).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Nicole
Premium Member
Premium Member
Posts: 93
Joined: Mon Oct 22, 2007 12:55 pm
Location: Albany
Contact:

Post by Nicole »

The DB2 stage extracts data - pretty simple join - select with some where criteria on relational columns - and pretty much just passing the data to the web service (web service transformer stage). Trying to optimize as the job is taking a while - so we have been trying to figure out if its in the extract stage or is it extracting fast enough but takes time passing the data to the web service. Did I miss anything?
eostic wrote:hmm.. ...for further qualification, your note says you are "passing data to a web service", but the note is in the RTI Services topic --- do you mean that this job "is" a web service and you are passing data to RTIOutput Stage? ....then it's a slightly different discussion.....

...and if that's the case, tell us more about the Job....does it have RTIInput, or just DB2 as the sole source? What are you passing into the service? DataStage Job Parameters?

....if so, there might be several things we can do to speed things up.....and in most cases, dramatically (100:1 or better in many cases).

Ernie
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Take the service out of the picture. Copy the job and use a regular transformer as a stand-in, then check the performance. I'll wager the service itself is the bottle-neck.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

.....now that we know it's WSTransformer, which is a row by row process, how many rows are we talking about?

....where do they go after they pass thru the WSTransformer?

Web Services are definitely not screamers. I certainly wouldn't use them for large scale quantities of data. A really good SOAP based service (for example, written entirely in Java, not complex, and native to the app server) might have a round trip time (request, process, response) of say....10 milliseconds. While that's great for an across-the-network-xml-based-protocol that might also be doing lots of other things, it's horribly slow when we think about things we get used to in the batch oriented DataStage and regular ETL world. That's only 100 rows per second.....pitiful for a batch job, but screaming for a single client of a Web Service...

The host itself may be able to handle multiple clients (imagine a web services app that support a portal with hundreds of users smashing away at the enter key), and then drive that number (100 per second) far higher. (round trip is a measure for one client, not for capacity of the whole server)..but that will take more creative construction of your job, and also introduce caveats around row and invocation order, which may or may not be critical.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Nicole
Premium Member
Premium Member
Posts: 93
Joined: Mon Oct 22, 2007 12:55 pm
Location: Albany
Contact:

Post by Nicole »

Sorry for the delayed response - we are still trying to optimize and have made some improvements - but still need some guidance. To answer your questions:

- The # of rows varies, but during peak processing the highest # was approx. 653K - I see that we shouldn't use WSTransformer for high volumes. It was becoming an issue because the job was long-running - and if something tripped on the Database side, or the WAS server - then the job failed - and it was frustrating having to re-run it. We are looking into clustering the server that the web services are on, so that should help, right? At least, as far as the web service being unavailable and having to re-run the job??

- After the WSTransformer the results pass through a transformer to filter some of the data and then output to a sequential file stage.

- We are considering doing an initial extract of the data and writinig it to a file and then creating a job to read from the file into the web service, instead of directly reading from the DB to the web service. The table is used a lot and they thought it might be best if we did it that way.

- Now the question - is there a way to pass the data to the WSTransformer in parallel mode somehow? Or should we consider splitting the file into 4 separate files and run multiple instances of the job? What about reading from multiple nodes vs. multiple nodes per reader? What would be the best option in this case?

Thanks for your help!!

Nicole
eostic wrote:.....now that we know it's WSTransformer, which is a row by row process, how many rows are we talking about?

....where do they go after they pass thru the WSTransformer?

Web Services are definitely not screamers. I certainly wouldn't use them for large scale quantities of data. A really good SOAP based service (for example, written entirely in Java, not complex, and native to the app server) might have a round trip time (request, process, response) of say....10 milliseconds. While that's great for an across-the-network-xml-based-protocol that might also be doing lots of other things, it's horribly slow when we think about things we get used to in the batch oriented DataStage and regular ETL world. That's only 100 rows per second.....pitiful for a batch job, but screaming for a single client of a Web Service...

The host itself may be able to handle multiple clients (imagine a web services app that support a portal with hundreds of users smashing away at the enter key), and then drive that number (100 per second) far higher. (round trip is a measure for one client, not for capacity of the whole server)..but that will take more creative construction of your job, and also introduce caveats around row and invocation order, which may or may not be critical.

Ernie
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

....Exactly... you are now on the right track......"if" your Web service can handle it (that's a whole different discussion), you can create "additional" clients........

a) add more nodes. Each node process for that WSTransformer will be its own SOAP client, issuing calls with whatever rows it receives.

b) add more instances. Each Job instance will be its own SOAP clent, issuing calls with whatever rows it receives.

...these both have other implications of course, from the rest of your job, to the rest of the machine, and the logic everywhere else (is it ok for the calls and the rows to end up in a different order, etc.).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Nicole
Premium Member
Premium Member
Posts: 93
Joined: Mon Oct 22, 2007 12:55 pm
Location: Albany
Contact:

Post by Nicole »

Ok, I think I am close to fully understanding - we create a new config file with additional nodes.

And also, on the advanced tab of the WSTransformer - should we select 'Parallel'?

Is running a configuration with 4 nodes the same as running 4 instances of your job?
eostic wrote:....Exactly... you are now on the right track......"if" your Web service can handle it (that's a whole different discussion), you can create "additional" clients........

a) add more nodes. Each node process for that WSTransformer will be its own SOAP client, issuing calls with whatever rows it receives.

b) add more instances. Each Job instance will be its own SOAP clent, issuing calls with whatever rows it receives.

...these both have other implications of course, from the rest of your job, to the rest of the machine, and the logic everywhere else (is it ok for the calls and the rows to end up in a different order, etc.).

Ernie
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...Sort of....but now you are on a bigger topic...I can't comment on the rest of your job, and it's impact on having multiple nodes.... in general, they will all get multiples, although there are ways to control that --- but that is beyond this thread, discussed elsewhere in many topics in the forum. Don't take this effort lightly...if you haven't worked with multi-node config files you have a separate learning curve and effort above and beyond their (multiple nodes) impact on the WSTransformer.

Multi-instance is also a complex concept, though simpler than multi-node configs. You simply get the opportunity to spawn multiple jobs...giving each one a name, and then you'll see "n" jobs in the Director as they are running.

I can't recall whether the WSTransformer defaults to sequential operation or not, but it should be able to support the ability to run in parallel without a problem.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply