Page 1 of 1

Is there a way to restart a job from point of failure?

Posted: Wed Aug 08, 2018 10:27 am
by vbr_03
Hi ,

Is there any way to restart a parallel job to load the data from last failure point?

Posted: Wed Aug 08, 2018 1:39 pm
by leandrohmvieira
Sequence jobs does have some checkpoint functionality, which allow a sequence to restart from it.

Parallel Jobs and Server Jobs does not have any features like this. Can you provide some details of your problem?

Posted: Thu Aug 09, 2018 12:22 am
by ray.wurlod
Short answer, no.

You may be able to design jobs with a certain degree of restartability but, in general, the amount of effort required would make it not worthwhile.

Posted: Thu Aug 09, 2018 6:37 am
by chulett
Right, restartable jobs are certainly possible, I've always striven for atomic level job designs ('single units of work') to allow them to be restartable with little or no human intervention. I've posted high level notes here in the past describing the 'framework' we're using now to support that.

Restarting from the point of failure? That's a whole 'nuther kettle of fish, especially if there's any kind of complexity in the job design and would generally require some kid of... let's say "compromises"... with regard to job speed.

(technically, the tool I'm using now has a magical checkbox to enable that functionality but I've yet to try/playwith/trust any such feature)

Posted: Tue Aug 14, 2018 3:12 pm
by Joel in KC
Please let me know where I can find your framework and and "single unit of work" as we are trying to move to this type of usage, rather than the huge, complex systems that need re-starting,,,appreciate your time. New to the board. Thx again

Posted: Tue Aug 14, 2018 7:24 pm
by chulett
Both are mentioned here with some high level details for the framework. Hope it helps. As noted there, would really be interested to see if anyone has done anything like that in DataStage, mine is an Informatica implementation which makes it a tad easier.

Posted: Tue Aug 14, 2018 8:22 pm
by ray.wurlod
Where I need this functionality I, like Craig, create small atomic units of work as DataStage job, and make use of the restartability capability of sequence jobs to handle that. No point in re-inventing the wheel.

Posted: Wed Aug 15, 2018 8:52 am
by FranklinE
High-level error handling design is where restartability is identified. Error handling is a part of the definition of the unit of work.

Example:

1. Download file. If that fails, fix problem and rerun.
2. Process file. If there are no intermediate points of failure -- like commits -- if the process fails fix and rerun.
3. Etc.

DataStage permits jobs that do both functions in one parallel job. If your design does that, you're next step is to rewrite the job to create the separate units of work.

Job Sequence design covers the how and where.