Feasibility of using Real Time Stages for huge chunk of data

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
machudas
Participant
Posts: 5
Joined: Sat Jun 16, 2012 6:53 am
Location: Bangalore

Feasibility of using Real Time Stages for huge chunk of data

Post by machudas »

Hi,

Currently we are working on a MDM project.As part of the project we have to perform an address standardization operation which is done through a web service.

Address standardization has to be done for already existing data as well(which will be around 800,000 records).Please let us know whether it is feasible to use real time stage(in our case it is wsdl transformer) for standardization of existing data.

Das
Known is a drop unknown is an ocean
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... does the service accept any kind of 'bulk' processing or is it strictly singleton calls?
-craig

"You can never have too many knives" -- Logan Nine Fingers
machudas
Participant
Posts: 5
Joined: Sat Jun 16, 2012 6:53 am
Location: Bangalore

Post by machudas »

For testing purpose we are using singleton calls...but need to check with the web service team whether we can can pass array of i/p and receive the result as array of output.

For address standardization we are passing postal code and building number as input and receive the output.But we are not sure how complex it will be if we pass it as array of input and how the parsing of output array will happen?

More over is there any way in xml stage where we can control the number of rows in an array chunk.
Known is a drop unknown is an ocean
machudas
Participant
Posts: 5
Joined: Sat Jun 16, 2012 6:53 am
Location: Bangalore

Post by machudas »

We checked whether the web service can accept arrays or singleton calls.
It can accept only singleton calls.
Known is a drop unknown is an ocean
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Singleton calls to a web service is certainly not ideal for bulk data processing.

But since you only have 800,000 records and no stated time constraint, it is certainly feasible.

Depending on the complexity of your web service, I'd guess 800,000 records might go through a single web services transformer in an hour or two.

You can split your data and use multiple web services transformers to reduce overall elapsed time.

Mike
Post Reply