Near Real time ETL

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

I am currently using the infinite loop to produce real time data for 3 separate batches. I modified the batch properties to also allow for a sleep time overnight during our scheduled maintenance window. I have had a few problems with jobs aborting, however, most of them were due to circumstances beyond DataStage and the jobs would have aborted regardless if it was running in a loop or if they were scheduled individually. I do suggest putting in several notification messages for any point of failure with the systems that you will be running against.

Lisa







"Vivek Pandey." on 08/01/2001 07:00:47 AM

Please respond to datastage-users@oliver.com








To: datastage-users@oliver.com

cc: (bcc: Lisa Hamilton/COL/OH/NCS_HealthCare)



Subject: Near Real time ETL








Hi all,

I am devising a mechanism by which my ETL job polls for the availability of data on the source system. There are two options that i can think of to achieve this:

1) I call this job from a control job in an infinite loop....If data is available then the ETL begins, else nothing happens...and I iterate again. (im not feeling comfortable with this infinite loop thing)

2) The other option is to schedule the same job at intervals of half hour. I dont expect one round to go more than 20 mins. However, if i run out of luck and first scheduled job is still running while the next is called, im in a soup.

Any suggestions for the group as to what might be the best way to extract data in real time.

Cheers
Vivek
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Near Real time ETL

Post by admin »

Hi all,

I am devising a mechanism by which my ETL job polls for the availability of data on the source system. There are two options that i can think of to achieve this:

1) I call this job from a control job in an infinite loop....If data is available then the ETL begins, else nothing happens...and I iterate again. (im not feeling comfortable with this infinite loop thing)

2) The other option is to schedule the same job at intervals of half hour. I dont expect one round to go more than 20 mins. However, if i run out of luck and first scheduled job is still running while the next is called, im in a soup.

Any suggestions for the group as to what might be the best way to extract data in real time.

Cheers
Vivek
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Use option 2. Create a "Flag" for the job. Check for its existance. If it is there, the job is still processing and you abort the job. If it does not exist, create it with a beforejob subroutine. Remove it when the job exits with an AfterJob routine.

Cheers

Ray Daignault.
----- Original Message -----
From: "Vivek Pandey."
To:
Sent: Wednesday, August 01, 2001 9:00 PM
Subject: Near Real time ETL


: Hi all,
:
: I am devising a mechanism by which my ETL job polls for the availability of
: data on the source system. There are two options that i can think of to
: achieve this:
:
: 1) I call this job from a control job in an infinite loop....If data is
: available then the ETL begins, else nothing happens...and I iterate again.
: (im not feeling comfortable with this infinite loop thing)
:
: 2) The other option is to schedule the same job at intervals of half hour. I
: dont expect one round to go more than 20 mins. However, if i run out of luck
: and first scheduled job is still running while the next is called, im in a
: soup.
:
: Any suggestions for the group as to what might be the best way to extract
: data in real time.
:
: Cheers
: Vivek
:
:
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Vivek, you can put logic into your batch job that is polling for files to also poll an ETL specific directory for flag files. If youre on UNIX, you can simply use touch to create a file in this directory that has meaning to your batch job. In your cycling logic, check this directory for the existence of files such as ETL_STOP.flg, ETL_PAUSE.flg, or ETL_RESUME.flg. The stop flag file is an easy way for your logic to gracefully exit. The pause and resume flags can be coded to allow your process to stay active, but paused while you perform any operations that may come up. The resume flag can be used to tell the process to continue. I successfully use such logic myself for all my clients, as a long series of jobs can be easily stopped, paused, and resumed mid-run without having to kill jobs.

Good luck!
-Kenneth Bland
Principal Consultant
Ascential Software








vive@sonata-software.com on 01-Aug-2001 07:00



Please respond to datastage-users@oliver.com

To: datastage-users
cc:
Subject: Near Real time ETL


Hi all,

I am devising a mechanism by which my ETL job polls for the availability of data on the source system. There are two options that i can think of to achieve this:

1) I call this job from a control job in an infinite loop....If data is available then the ETL begins, else nothing happens...and I iterate again. (im not feeling comfortable with this infinite loop thing)

2) The other option is to schedule the same job at intervals of half hour. I dont expect one round to go more than 20 mins. However, if i run out of luck and first scheduled job is still running while the next is called, im in a soup.

Any suggestions for the group as to what might be the best way to extract data in real time.

Cheers
Vivek
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi Vivek

You can try to the following. Create a control job or better a script if it is in unix or an app in VB or something like that to be fired off with some parameters where you supply it the time you want to wait for the data to appear before abort and an interval to check say every 60 seconds. If the data arrives then fire the job with dsjob or log a fatal error in the log that data did not arrive for the number of hours etc. You can right this with datastage basic as well.

Hendrik Kotze
Systems Engineer
Ascential Software South Africa
tel: +27 11 807-0313
fax: +27 11 807-2594
mobile: +27 83 326-6439


-----Original Message-----
From: Vivek Pandey. [mailto:vive@sonata-software.com]
Sent: Wednesday, August 01, 2001 1:01 PM
To: datastage-users@oliver.com
Subject: Near Real time ETL

Hi all,

I am devising a mechanism by which my ETL job polls for the availability of data on the source system. There are two options that i can think of to achieve this:

1) I call this job from a control job in an infinite loop....If data is available then the ETL begins, else nothing happens...and I iterate again. (im not feeling comfortable with this infinite loop thing)

2) The other option is to schedule the same job at intervals of half hour. I dont expect one round to go more than 20 mins. However, if i run out of luck and first scheduled job is still running while the next is called, im in a soup.

Any suggestions for the group as to what might be the best way to extract data in real time.

Cheers
Vivek
Locked