Time out error encountered while running a job sequence

RohanSharma · Post by **RohanSharma** » Tue May 29, 2007 10:44 pm

Hello ,

I had 3 jobs sequenced ina job sequencer and while running the jobs i encounterred a problem :

PROBLEM
Sequencer name :SeqInit1000_3001_3002
Job name :Extract_030_Card1000_StartFile
While this job was being run the job sequencer failed by giving the following error message :
"SeqInit1000_3001_3002..JobControl (@Coordinator): Summary of sequence run
08:13:33: Sequence started (checkpointing on)
08:13:33: Job_Activity_0 (JOB Extract_030_Card1000_StartFile) started
08:14:35: Exception raised: @Job_Activity_0, Error calling DSRunJob(Extract_030_Card1000_StartFile), code=-14 [Timed out while waiting for an event]
08:14:35: Sequence failed (restartable)"

SOLUTION

I simply recompiled the jobs and the sequencer and the run went on fine.

CAUSE OF PROBLEM

MY EXPLANATION Every job in a Datastage has some OSH code in it .
Now i suppose that the osh code of the sequencer was not proper.
So while i recompiled the job it re generated the osh code for the sequencer and it was fine

Explanation of Error message :

Please help me on this ..

Also Which category do sequencers fall into
Parallel jobs or server jobs ??

ArndW · Post by **ArndW** » Wed May 30, 2007 12:16 am

That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.

vijayrc · Post by **vijayrc** » Wed May 30, 2007 2:34 pm

ArndW wrote:That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.

-14 is resource overloading, either no of jobs are running concurrenlty or that the resources are over-utilized, which causes certain jobs to end with -14, after waiting for resources for a certain period of time. A resubmit would just do to make this run again.

ArndW · Post by **ArndW** » Wed May 30, 2007 5:43 pm

I would be surprised if a sequence that starts 3 jobs concurrently will overload a system - unless there are other things going on that use a lot of resources. In this case we don't know if the system was just less busy after the recompile (which is my guess) or if the object code was corrupted in such a way that the recompile fixed it.

chulett · Post by **chulett** » Wed May 30, 2007 6:32 pm

I'd wager it's all about the timing. Too much going on when the jobs were first run, but after taking the time to compile them after the failure, they run fine because other processes have completed and they were able to start within the timeout period. :D

I really doubt the recompile 'fixed' anything...