Improve REST service call performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Improve REST service call performance

Post by atul9806 »

Hi Gurus,
I am calling a REST service in Hierarchical stage and using the POST method.

Design:

(row gen) --> Hierarchical stage --> Flat file (Response)

I am passing the payload as fileName and job is running on 2 node configuration.

Performance Data:
Run1 - start time : 12 sec, prod time : 13 sec
Run2 - 11, 13
Run3 - 11, 12

When I am testing the same data with SoapUI, it is taking 9, 9, and 8 sec.
How can I improve the job performance as this job is running in loop to process multiple files and due to startup time it is taking lot of time to complete all files.
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There are a few things you can do to reduce initialization time, although with a REST based service call, you probably have a very small schema, so it might not be something you can optimize much....the Hierarchical Stage itself takes time to load......but try looking at the feature in the Schema Library called "Schema Views". If you can set up a schema view for only the "Tree" that you need for the input payload, and the output payload, that may help, especially if those models are coming from a really big schema. It will also help performance during maintenance of the Job.

The other item that might cut out some time is to consider using a Server Job. Server Jobs outright have less initialization overhead than EE Jobs, and if it is a short running Job like this, you certainlhy aren't needing high powered parallelism. The Hierarchical Stage is available in Server, but is more limited (only can take one input link, for example) and isn't as widely used there, so test extensively if you choose to go that way...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I would second that second part - test it using a Server job; that should nearly eliminate the job "startup time" component.
Choose a job you love, and you will never have to work a day in your life. - Confucius
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Post by atul9806 »

Thanks Ernie and qt_ky for your suggestions.

Here is what I have done in my Server job -

Txf (generating a single dummy row to pass the payload file ) --> Hierarchical Stage --> Output Seq file

job is keep aborting after few seconds with warning "Abnormal termination of stage transformer", I have rechecked by transformer but did not find any issue.

After checking the output, It seems job is successfully finished as my data is loaded into target system (via REST service) but job is aborted as transformer did not generated second row.

How can I make my job to be finished not aborted after 1 row generation as it is confusing.
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hard to say...I use this technique all the time, like this:

Add a Stage variable... and just have StageVar + 1 in its Derivation...

Then add a Constraint to the output link of StageVar <= 1

If that still aborts in this scenario, you may simply need a dummy sequential file or something to drive your Job.

More importantly though --- did using Server reduce the extra 4 second overhead?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

....and by the way, what exactly do you mean by "running in a loop to process multiple files"?

...is there a reason you need to run the "whole job" in a loop? What are you picking up from the file(s) that you aren't able to pick up in a "single job"? (ie...a job that reads the files continually for all the files in a particular subdirectory).

I don't know what your Job is doing, so it may not be possible, but it would be great if you could eliminate all start-up time between files, and treat each new file as just another "row"...... ...then your Server Job could run even faster overall...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Post by atul9806 »

I am splitting payload file into multiple as this is huge to send over webservice and due to business constraints for sending the data this is only the way.
That is why I am splitting the payload file into multiple and calling the web service in loop.

I have used the same logic for generating the single row in transformer. by using constraints as sv <=1.
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Is the payload split inside the Job, or do you split it externally into multiple files and then call the Job repeatedly for each of the file splits?

...and then, how are you reading the file? Is it a pre-formed xml chunk ready for your service call? ...or is it just a bunch of rows that you are then crafting into a request payload?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Post by atul9806 »

I am splitting the payload file outside the datastage box using a script and then passing the filename in hierarchical stage to read and post the data to web service.
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

You should be able to eliminate 99.99 % of all the overhead that you are experiencing.

If in a Server Job, get familiar with the Folder Stage --- send the url and/or content into the Hierarchical Stage directly....never stop the Job until all of your files are exhausted.......

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
atul9806
Participant
Posts: 96
Joined: Tue Mar 06, 2012 6:12 am
Location: Pune
Contact:

Post by atul9806 »

Ernie,
I have gone through the Folder stage and develop a server job for my task -

Design:
Folder --> Hierarchical Stage --> Flat File

In Folder stage, I am passing the Source Folder name and regular expression for file but still my job is keep aborting.

Warning : Abnormal termination of stage detected.

Weird thing is, the warning comes for the stage which is connected with Hierarchical stage as a source.
I have tried to check this with below design -

Folder --> Copy ---> Hierarchical Stage --> Flat File (Warning for Copy)
Folder --> Xfm -----> Hierarchical Stage --> Flat File ( Warning for transformer)

Tried this design as well and this works fine
Folder ---> Seq File

any suggestions what I have to look into Hierarchical stage as it seems problem is there Or if there is another way which I can use for POST REST web service
~Atul Singh
<a href=http://www.datagenx.net>DataGenX</a> | <a href=https://www.linkedin.com/in/atulsinghds>LinkedIn</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It's hard to say without knowing more about the actual error, but there are some possibilities.... make ABSOLUTELY certain that you are using the correct option in your XML Parser step.

The Folder Stage (which should then be passed into a Transformer, if anything) has two options for its built-in table def. "filename" and "record". I would usually just pick one to go on past the Transformer. If you choose "filename", then you need to use the "file set" option in the XML Parser..... then pull down and select that column from your input link.... if you choose to send "Record" then you need to use the "string set" option in the xml Parser (one is sending the fully qualified filename for each file, and the other eats the whole file and sends the entire "content" inside the column). A wrong choice and I would expect the Job to abort in a fairly ugly fashion.

You could also go back to an EE Job now, because your overhead will still only be "one time" for all your files. In that case, use an External Source Stage with a unix "list" command (ls). Something like "ls /tmp/myxmlfiles/*.xml" as the command. Give it varchar/255 and then again, use the "file set" option in the Hierarchical Stage.

Ernie

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply