Real time ETL in datastage using CDC transaction stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dbdecoy
Premium Member
Premium Member
Posts: 17
Joined: Tue Jul 15, 2008 1:17 pm
Location: Hyderabad

Real time ETL in datastage using CDC transaction stage

Post by dbdecoy »

Hi,

I am currently looking for acheiving real time ETL scenario using CDC transaction stage , i have read about this stage , but need few clarifications , using this stage does the datastage job will be online everytime ? also does this stage trigger the datastage job whenever there is a update in source DB ?

For acheving real time ETL , can you please put some suggestions how can we acheive in datastage if not using CDC transaction stage

Note : we currently use IBM Change data capture tool to get real time updates from source , as of now we are loading the data in to our staging area and then running our ETL jobs in batches daily, we are trying to remove the staging area part using this CDC transaction stage to achive the real time ETL process done

Please let me know if you need any further details

Thanks in advance
dbdecoy
Premium Member
Premium Member
Posts: 17
Joined: Tue Jul 15, 2008 1:17 pm
Location: Hyderabad

Re: Real time ETL in datastage using CDC transaction stage

Post by dbdecoy »

Hi,

Could anyone please help me on this

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Generally, regardless of stage, 'real time' in this context would mean publishing your job as an 'always on' job, a web service. You might get more information by taking a look through the SOA Editions forum here. I would be curious about what kind of volume you'll need to be processing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

IBM Change data capture tool / IBM InfoSphere Data Replication software comes with a feature that you can use from its user interface to export or generate a DSX file, based on your replication subscription(s). You can import that into a DataStage project by using Designer, Import, and you will find it includes sequence jobs, routines, and various other jobs that will help you get a jump start on replicating the data.

As far as the always-on job question, I got the impression from documentation that it is an option, but I didn't get a chance to test it out.
Choose a job you love, and you will never have to work a day in your life. - Confucius
dbdecoy
Premium Member
Premium Member
Posts: 17
Joined: Tue Jul 15, 2008 1:17 pm
Location: Hyderabad

Post by dbdecoy »

Thanks Craig, I will look in to the SOA Editions and will update on this forum. Regarding the volume it should be 3 to 7 Million records.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi. I haven't used this stage extensively since some very early tests when it came out, so cannot vouch for its behavior. I do recall, quite a few years ago, that there was work done to ensure that an end-of-wave was inserted specifically, but I don't know exactly which release or which edition of the integration. I am not an expert on CDC, but recall that the Stage was fairly smart and knew about subscriptions and bookmarks, and made it easier to integrate DataStage with CDC rather than using user exits in CDC to go to something like MQ. If it (still) behaves similar to what I recall, then it uses the CDC's own API to basically "listen" on a connection and receive changes as soon as they are known and sent by CDC. That makes it "always on". Again, this understanding could be outdated. There was another pattern that was also popular with CDC, as someone mentioned above --- where a .dsx and script was provided that did a "smart cycling"....I think reading a sort of "checkpointed" flat file that was cut by CDC. A bit old fashioned, but very reliable, as I recall.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
dbdecoy
Premium Member
Premium Member
Posts: 17
Joined: Tue Jul 15, 2008 1:17 pm
Location: Hyderabad

Post by dbdecoy »

Hi, Currently we are testing the option of always -on job by creating a datastore in CDC to connect to Datastage, but facing some problems due to version mismatch between the CDC tool and Datastage, will update this forum once we show some progress.
Post Reply