Page 1 of 1

Wrapper for ETL Webservices

Posted: Thu Aug 02, 2007 1:18 am
by Sudhindra_ps
hi All,

1) Can anybody please tell me which technology would be the best one to use as wrapper for ETL jobs exposed as webservices?
2) Does core Java programming language provide better performance when used as Wrapper for ETL jobs which are exposed as webservices?
3) And lastly can anybody please tell me on to how can I retrieve output from a ETL job which is exposed as Webservice?

With respect to my last question above, I need to collect few outputs from ETL job(such as Source Extract Record Count and Target Load Record Count). I need to access these aggregated values from Webservice as ETL jobs are exposed as SOAP over HTTP(i.e as Webservices).
Your help in this issue will be highly appreciated.

Thanks & regards
Sudhindra P S

Posted: Thu Aug 02, 2007 5:57 am
by ray.wurlod
1) DataStage jobs exposed as web services don't need any wrapper.
2) See previous answer.
3) Typically you set up the DataStage job to receive and return small XML documents when exposing it as a web service.

Posted: Thu Aug 02, 2007 6:40 am
by chulett
You might want to clarify why you think you need to 'wrapper' them... or do you simply mean 'call'? That being said, I don't believe there is a 'best' or 'better performing' language to use on the calling end. :?

As to your last question, you can only 'retrieve' whatever the job has been programmed to return - and as noted, this is typically a little squirt of XML back to you.

Bundle Stored Procedures with ETL jobs as WebServices

Posted: Thu Aug 02, 2007 10:31 pm
by Sudhindra_ps
hi Chulett,

We are using Organization wide built in Scheduler to schedule ETL jobs as WebServices. This scheduler application is built upon .Net and Oracle applications. I need to update this Scheduler Oracle database "before","during" and "after execution" completion of ETL job using Stored Procedures.
So, I was just trying to see if we can bundle up these Stored Procedures and ETL jobs as Webservices as one component and schedule it on Scheduler. To bundle these components together can I make use of Java programming language and schedule it on Scheduler. As Scheduler has the flexibility to run java programs too.

Thanks & regards
Sudhindra P S

Posted: Thu Aug 02, 2007 10:44 pm
by ray.wurlod
THen your Java would be the caller, rather than the wrapper. You can do the same thing with shell scripts.

Posted: Thu Aug 02, 2007 11:38 pm
by chulett
So... you don't have the SOA Edition / RTI? :?

I'm a little lost on how you can 'schedule ETL jobs as Web services', too. :(

Posted: Fri Aug 03, 2007 12:45 am
by Sudhindra_ps
hi Ray/Chulett,

We have IBM Information Server using which we would be exposing Datastage jobs as WebServices. I have few PL/SQL stored procedures which I need to invoke during each stage from Job initialization to job completion phase to update Scheduler database. The flow of components invoking mechanism would be as follows.
1) Invoke PL/SQL Stored Procedure 1 (Scheduler Database Update)
2) Kick Start ETL job 1 which is exposed as Webservice
3) Invoke PL/SQL Stored Procedure 2 (Scheduler Database Update)
4) Kick Start ETL job 2 which is exposed as Webservice
5) Invoke PL/SQL Stored Procedure 3 (Scheduler Database Update)

As I mentioned above all these components should be executed one after the other in sequence. [b]The PL/SQL stored procedure will be a common service for all my ETL jobs in the project[/b].
So, I was just trying to figure out can we bundle these components using some wrapper scripts(such as Java) and schedule it on Windows based scheduler tool(which supports Java and Webservice calls). Your architecture suggestions will be of great help to me.

Ray, I could do the same using Shell Scripts quite easily but I would not be able to invoke web services using shell scripts, I guess. This is where am struggling to evaluate wrapper program for this mechanism.

Thanks & regards
Sudhindra P S

Posted: Fri Aug 03, 2007 1:23 am
by ray.wurlod
You DO NOT use wrappers. The DataStage job itself is exposed as the Web service. Design is something like:

Code: Select all

RTI Input ---> XML Input  --->  anything  ---> XML Output ---> RTI Output
The RTI Input stage listens for a web client. The XML Input stage "translates" the incoming XML document into data that the job can process. The XML Output stage writes an XML document, and the RTI Output stage sends it back to the web client. The XML stages are, of course, optional - though they are usually there.
The job itself may be single or multi-instance, auto-start or always running.

Posted: Thu Aug 09, 2007 7:53 pm
by eostic
You will also need to be sure you can call your stored procedures successfully from DS, probably using the SP Stage.....and then publish those jobs as Services .....if you are in v8, then it's via WISD, if 7.x, then RTI (same underlying technology).... get your SP's to work PERFECTLY with a normal batch job, using sequential stages as sources and targets, and with whatever XML processing you need, and then work on exposing them.

Ernie

Posted: Thu Aug 16, 2007 12:59 am
by Sudhindra_ps
hi Eostic,

Thanks for the resolution you have provided. I was able to wrap up both Oracle Stored Procedures and ETL webservices as one component using Java programming language as a wrapper script. This tends to give good performance in terms of invoking both Oracle Stored Procedures and ETL jobs as webservices. The only things am lagging at the moment is capturing specific error thrown by ETL jobs whenever any jobs aborts. As error thrown by webservices is a very generic one to the caller as it says "Remote Method EJB exception" on ETL job failure. Is there any way I can dig in to find specific error by which ETL jobs fail when invoked as webservices.
And secondly another difficulty am having right now is whenever ETL job as webservice aborts due to Datastage job failure am unable to "reset" the job. Do you have any suggestion on this for me.

Thanks & regards
Sudhindra P S

Posted: Thu Aug 16, 2007 3:13 pm
by eostic
Well, first of all, I'm still confused as to what method you used....did you simply use the C API for invoking DS jobs and then wrapper that yourself as a web service? Or did you use RTI?

Using RTI is far more elegant, way less coding, no need for deep web services expertise, and most important, providing easily managed load balancing, multiple instance support, and a security and failover mechanism.

Assuming you are using RTI, the best method for capturing errors on failure is to work on various techniques to capture those errors "inside" the DataStage job (assuming these are logical failures such as rejects from an rdbms and not job aborts) and then return those strings as part of your response payload. For instance, use Server side Reject links from a transformer with link variables to capture rdbms state codes, etc. (I'm sure there are 100's of ideas on this already posted to this forum by the DataStage experts! ;) ). If the job is actually "aborting," then it probably needs more testing and development as a non-RTI job to find out why and capture all the ugly things that could happen or fix the logic.

As for reset, it's not necessary if you are using RTI......RTI will automatically restart the instances to get back to the minimum setting. If you are doing all of this home-grown, then you would need to use the re-set API call for job control.

Ernie