Wrapper for ETL Webservices

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
Sudhindra_ps
Participant
Posts: 45
Joined: Thu Aug 31, 2006 3:13 am
Location: Bangalore

Wrapper for ETL Webservices

Post by Sudhindra_ps »

hi All,

1) Can anybody please tell me which technology would be the best one to use as wrapper for ETL jobs exposed as webservices?
2) Does core Java programming language provide better performance when used as Wrapper for ETL jobs which are exposed as webservices?
3) And lastly can anybody please tell me on to how can I retrieve output from a ETL job which is exposed as Webservice?

With respect to my last question above, I need to collect few outputs from ETL job(such as Source Extract Record Count and Target Load Record Count). I need to access these aggregated values from Webservice as ETL jobs are exposed as SOAP over HTTP(i.e as Webservices).
Your help in this issue will be highly appreciated.

Thanks & regards
Sudhindra P S
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1) DataStage jobs exposed as web services don't need any wrapper.
2) See previous answer.
3) Typically you set up the DataStage job to receive and return small XML documents when exposing it as a web service.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You might want to clarify why you think you need to 'wrapper' them... or do you simply mean 'call'? That being said, I don't believe there is a 'best' or 'better performing' language to use on the calling end. :?

As to your last question, you can only 'retrieve' whatever the job has been programmed to return - and as noted, this is typically a little squirt of XML back to you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sudhindra_ps
Participant
Posts: 45
Joined: Thu Aug 31, 2006 3:13 am
Location: Bangalore

Bundle Stored Procedures with ETL jobs as WebServices

Post by Sudhindra_ps »

hi Chulett,

We are using Organization wide built in Scheduler to schedule ETL jobs as WebServices. This scheduler application is built upon .Net and Oracle applications. I need to update this Scheduler Oracle database "before","during" and "after execution" completion of ETL job using Stored Procedures.
So, I was just trying to see if we can bundle up these Stored Procedures and ETL jobs as Webservices as one component and schedule it on Scheduler. To bundle these components together can I make use of Java programming language and schedule it on Scheduler. As Scheduler has the flexibility to run java programs too.

Thanks & regards
Sudhindra P S
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

THen your Java would be the caller, rather than the wrapper. You can do the same thing with shell scripts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... you don't have the SOA Edition / RTI? :?

I'm a little lost on how you can 'schedule ETL jobs as Web services', too. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sudhindra_ps
Participant
Posts: 45
Joined: Thu Aug 31, 2006 3:13 am
Location: Bangalore

Post by Sudhindra_ps »

hi Ray/Chulett,

We have IBM Information Server using which we would be exposing Datastage jobs as WebServices. I have few PL/SQL stored procedures which I need to invoke during each stage from Job initialization to job completion phase to update Scheduler database. The flow of components invoking mechanism would be as follows.
1) Invoke PL/SQL Stored Procedure 1 (Scheduler Database Update)
2) Kick Start ETL job 1 which is exposed as Webservice
3) Invoke PL/SQL Stored Procedure 2 (Scheduler Database Update)
4) Kick Start ETL job 2 which is exposed as Webservice
5) Invoke PL/SQL Stored Procedure 3 (Scheduler Database Update)

As I mentioned above all these components should be executed one after the other in sequence. [b]The PL/SQL stored procedure will be a common service for all my ETL jobs in the project[/b].
So, I was just trying to figure out can we bundle these components using some wrapper scripts(such as Java) and schedule it on Windows based scheduler tool(which supports Java and Webservice calls). Your architecture suggestions will be of great help to me.

Ray, I could do the same using Shell Scripts quite easily but I would not be able to invoke web services using shell scripts, I guess. This is where am struggling to evaluate wrapper program for this mechanism.

Thanks & regards
Sudhindra P S
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You DO NOT use wrappers. The DataStage job itself is exposed as the Web service. Design is something like:

Code: Select all

RTI Input ---> XML Input  --->  anything  ---> XML Output ---> RTI Output
The RTI Input stage listens for a web client. The XML Input stage "translates" the incoming XML document into data that the job can process. The XML Output stage writes an XML document, and the RTI Output stage sends it back to the web client. The XML stages are, of course, optional - though they are usually there.
The job itself may be single or multi-instance, auto-start or always running.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

You will also need to be sure you can call your stored procedures successfully from DS, probably using the SP Stage.....and then publish those jobs as Services .....if you are in v8, then it's via WISD, if 7.x, then RTI (same underlying technology).... get your SP's to work PERFECTLY with a normal batch job, using sequential stages as sources and targets, and with whatever XML processing you need, and then work on exposing them.

Ernie
Sudhindra_ps
Participant
Posts: 45
Joined: Thu Aug 31, 2006 3:13 am
Location: Bangalore

Post by Sudhindra_ps »

hi Eostic,

Thanks for the resolution you have provided. I was able to wrap up both Oracle Stored Procedures and ETL webservices as one component using Java programming language as a wrapper script. This tends to give good performance in terms of invoking both Oracle Stored Procedures and ETL jobs as webservices. The only things am lagging at the moment is capturing specific error thrown by ETL jobs whenever any jobs aborts. As error thrown by webservices is a very generic one to the caller as it says "Remote Method EJB exception" on ETL job failure. Is there any way I can dig in to find specific error by which ETL jobs fail when invoked as webservices.
And secondly another difficulty am having right now is whenever ETL job as webservice aborts due to Datastage job failure am unable to "reset" the job. Do you have any suggestion on this for me.

Thanks & regards
Sudhindra P S
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Well, first of all, I'm still confused as to what method you used....did you simply use the C API for invoking DS jobs and then wrapper that yourself as a web service? Or did you use RTI?

Using RTI is far more elegant, way less coding, no need for deep web services expertise, and most important, providing easily managed load balancing, multiple instance support, and a security and failover mechanism.

Assuming you are using RTI, the best method for capturing errors on failure is to work on various techniques to capture those errors "inside" the DataStage job (assuming these are logical failures such as rejects from an rdbms and not job aborts) and then return those strings as part of your response payload. For instance, use Server side Reject links from a transformer with link variables to capture rdbms state codes, etc. (I'm sure there are 100's of ideas on this already posted to this forum by the DataStage experts! ;) ). If the job is actually "aborting," then it probably needs more testing and development as a non-RTI job to find out why and capture all the ugly things that could happen or fix the logic.

As for reset, it's not necessary if you are using RTI......RTI will automatically restart the instances to get back to the minimum setting. If you are doing all of this home-grown, then you would need to use the re-set API call for job control.

Ernie
Post Reply