Time out error encountered while running a job sequence

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RohanSharma
Participant
Posts: 20
Joined: Sun Jan 28, 2007 10:06 pm
Location: Gurgaon

Time out error encountered while running a job sequence

Post by RohanSharma »

Hello ,

I had 3 jobs sequenced ina job sequencer and while running the jobs i encounterred a problem :

PROBLEM
Sequencer name :SeqInit1000_3001_3002
Job name :Extract_030_Card1000_StartFile
While this job was being run the job sequencer failed by giving the following error message :
"SeqInit1000_3001_3002..JobControl (@Coordinator): Summary of sequence run
08:13:33: Sequence started (checkpointing on)
08:13:33: Job_Activity_0 (JOB Extract_030_Card1000_StartFile) started
08:14:35: Exception raised: @Job_Activity_0, Error calling DSRunJob(Extract_030_Card1000_StartFile), code=-14 [Timed out while waiting for an event]
08:14:35: Sequence failed (restartable)"


SOLUTION

I simply recompiled the jobs and the sequencer and the run went on fine.

CAUSE OF PROBLEM


MY EXPLANATION Every job in a Datastage has some OSH code in it .
Now i suppose that the osh code of the sequencer was not proper.
So while i recompiled the job it re generated the osh code for the sequencer and it was fine

Explanation of Error message :

Please help me on this .. :P


Also Which category do sequencers fall into
Parallel jobs or server jobs ??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

ArndW wrote:That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.
-14 is resource overloading, either no of jobs are running concurrenlty or that the resources are over-utilized, which causes certain jobs to end with -14, after waiting for resources for a certain period of time. A resubmit would just do to make this run again.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would be surprised if a sequence that starts 3 jobs concurrently will overload a system - unless there are other things going on that use a lot of resources. In this case we don't know if the system was just less busy after the recompile (which is my guess) or if the object code was corrupted in such a way that the recompile fixed it.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'd wager it's all about the timing. Too much going on when the jobs were first run, but after taking the time to compile them after the failure, they run fine because other processes have completed and they were able to start within the timeout period. :D

I really doubt the recompile 'fixed' anything...
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply