Hello ,
I had 3 jobs sequenced ina job sequencer and while running the jobs i encounterred a problem :
PROBLEM
Sequencer name :SeqInit1000_3001_3002
Job name :Extract_030_Card1000_StartFile
While this job was being run the job sequencer failed by giving the following error message :
"SeqInit1000_3001_3002..JobControl (@Coordinator): Summary of sequence run
08:13:33: Sequence started (checkpointing on)
08:13:33: Job_Activity_0 (JOB Extract_030_Card1000_StartFile) started
08:14:35: Exception raised: @Job_Activity_0, Error calling DSRunJob(Extract_030_Card1000_StartFile), code=-14 [Timed out while waiting for an event]
08:14:35: Sequence failed (restartable)"
SOLUTION
I simply recompiled the jobs and the sequencer and the run went on fine.
CAUSE OF PROBLEM
MY EXPLANATION Every job in a Datastage has some OSH code in it .
Now i suppose that the osh code of the sequencer was not proper.
So while i recompiled the job it re generated the osh code for the sequencer and it was fine
Explanation of Error message :
Please help me on this ..
Also Which category do sequencers fall into
Parallel jobs or server jobs ??
Time out error encountered while running a job sequence
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 20
- Joined: Sun Jan 28, 2007 10:06 pm
- Location: Gurgaon
That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-14 is resource overloading, either no of jobs are running concurrenlty or that the resources are over-utilized, which causes certain jobs to end with -14, after waiting for resources for a certain period of time. A resubmit would just do to make this run again.ArndW wrote:That is a good hypothesis, but would only hold true if it failed with a -14 every time. Usually this error is produced when a machine is overloaded, i.e. when it is so busy that new tasks are not started within DS limits.
I would be surprised if a sequence that starts 3 jobs concurrently will overload a system - unless there are other things going on that use a lot of resources. In this case we don't know if the system was just less busy after the recompile (which is my guess) or if the object code was corrupted in such a way that the recompile fixed it.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I'd wager it's all about the timing. Too much going on when the jobs were first run, but after taking the time to compile them after the failure, they run fine because other processes have completed and they were able to start within the timeout period. :D
I really doubt the recompile 'fixed' anything...
I really doubt the recompile 'fixed' anything...
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers