Page 1 of 1

Feasibility of using Real Time Stages for huge chunk of data

Posted: Thu Jan 15, 2015 9:37 am
by machudas
Hi,

Currently we are working on a MDM project.As part of the project we have to perform an address standardization operation which is done through a web service.

Address standardization has to be done for already existing data as well(which will be around 800,000 records).Please let us know whether it is feasible to use real time stage(in our case it is wsdl transformer) for standardization of existing data.

Das

Posted: Thu Jan 15, 2015 10:22 am
by chulett
Hmmm... does the service accept any kind of 'bulk' processing or is it strictly singleton calls?

Posted: Thu Jan 15, 2015 11:29 am
by machudas
For testing purpose we are using singleton calls...but need to check with the web service team whether we can can pass array of i/p and receive the result as array of output.

For address standardization we are passing postal code and building number as input and receive the output.But we are not sure how complex it will be if we pass it as array of input and how the parsing of output array will happen?

More over is there any way in xml stage where we can control the number of rows in an array chunk.

Posted: Fri Jan 16, 2015 6:47 am
by machudas
We checked whether the web service can accept arrays or singleton calls.
It can accept only singleton calls.

Posted: Fri Jan 16, 2015 7:40 am
by Mike
Singleton calls to a web service is certainly not ideal for bulk data processing.

But since you only have 800,000 records and no stated time constraint, it is certainly feasible.

Depending on the complexity of your web service, I'd guess 800,000 records might go through a single web services transformer in an hour or two.

You can split your data and use multiple web services transformers to reduce overall elapsed time.

Mike