Real time data processing using DataStage

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

Real time data processing using DataStage

Post by abhinavsuri »

Hi,

I have a project requirement wherein I need to process information sent from the source system in real time. For example ,when a new customer registers, this new customers data is sent immmediately to ETL. ETL should then trigger the job and process the file immediately. There could be a case where more than one file appears per minute.

One approach to achieve this could be by using a shell script to check for the arrival of a file. This script will run 24x7 and will invoke an instance of the job as soon as a file arrives.
Is this approach advisable?

However, I have also read about some plugin stages like MQ connector and Webservices stages.
Will these stages provide me with any additional functionality?
What are the advantages of these stages?
How exactly do these stages work?
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

Post by abhinavsuri »

Also pls provide me information as to what else is required for using these stages. Do we need to install anything else besides the additional stages?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Triggering the job is not real time. To get real time you need an always-running job that listens - maybe to an MQ series queue, maybe because it's been exposed as a web service using WISD components.

What you propose is certainly feasible, but will incur job startup time, which may not be acceptable if the requirement truly is "real time".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

You need to post more details about "real-time" and about the front-end that provides the data. Is real-time near-instantaneous? Or is within a few minutes good enough?

I've worked on systems where a web-based front-end created MQ messages that were picked up by an always running DataStage job using MQ connectors. It then updated the database within seconds and sent confirmation back to the web-application. This isn't trivial to setup - and has maintenance implications as well (ie: what do you do if database is down?).
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I've done a couple of implementations that people called "real-time", but I much prefer to use "near real-time" or just to specify that the application needs to be synchronous. At present I am on an implementation using MQSeries for data transfer and there are host systems involved, pc clients as well as the UNIX servers for the transformation and repositories.

As mentioned before, if a DataStage job is constantly running and listening for data you will have near-real-time. If you have a job that gets started every 10 minutes and processes data then, for some, that is also near-real-time while other sites would consider that 10 minute delay inacceptable.
Post Reply