division into jobs

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
and
Participant
Posts: 14
Joined: Thu Oct 05, 2017 5:36 am

division into jobs

Post by and »

hi all

i need to create two files (file set) for two lookups based on dbms data.

i can create two jobs for filling each file or one job for filling two files.

what is the best practice ?

one job per unit of job ? or may be one job per task


thanks
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

My personal preference generally is to use separate jobs for the sake of restartability. When something goes wrong and some jobs complete but another job aborts, then it can be easier and faster to troubleshoot the problem on a simpler job, and only the aborted job needs to be restarted. There's no extra processing or repeat processing of logic that already completed successfully.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

+1

We always try to build jobs as atomic, restartable units of work for precisely the reasons mentioned above. And we wrap them in a "framework" that knows how to back out any partially completed loads (where applicable) so that restarts can be as "hands off" as possible.
-craig

"You can never have too many knives" -- Logan Nine Fingers
and
Participant
Posts: 14
Joined: Thu Oct 05, 2017 5:36 am

Post by and »

qt_ky, chulett

thanks for notes

as for
And we wrap them in a "framework" that knows how to back out any partially completed loads (where applicable)
can you add more hints for this please

i have not so much expirience and this will be interesting for me


thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... a girl's got to keep some secrets. :wink: But high level:

It's basically a set of tables that record what jobs have run and assign a unique id to each run of each job. All records inserted or updated by the run are tagged with that number. We also have control tables that document what tables each job targets and what 'rollback mechanism' to use for each. A stored procedure is called when a failed job has been restarted that looks up the mechanism and id for the run that failed and resets the table back to pre-run conditions. For example, type 2 updates have their new record deleted and the previous entry set back to 'current'.

All jobs have these pipelines incorporated into them, something we call their job control framework, set to run in the proper order:

1. Check for and perform rollback if needed
2. Initialize a new run in the control tables
3. <actual work goes here>
4. Finalize the run in the control tables

Note that we're currently using Informatica for this, which has a "Target Load Plan" setting where you specify the order the pipelines run in, one after the other, for any given mapping. Been long enough that I'm not quite sure how you would accomplish something equivalent in DataStage. Be curious if others are doing something similar.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply