The infamous -14 DSJE_TIMEOUT issue
Posted: Mon Aug 15, 2011 10:59 am
Hi, I'm new member on this site, just wanted to say how useful this forum is.
I'm seeing Status code = -14 DSJE_TIMEOUT in virtually every cycle of my batch run (Information Server 8.1.0.0 Fix Pack 1 and Roll Up Patch 1 ),
This is running a common re-usable job to update a db2 table with the contents of a dataset.
So my reusable update job is executed from a unix script to update a db2 table with a dataset, we pass required params to it and the table is updated.
I continually see the infamous -14 DSJE_TIMEOUT, but a different datastage project on the same unix server running the same job does not.
I don't belive my system is overloaded as there is a large set of jobs which have been running as normal monthly processing.
Other datastage projects with copies of this reusable job are also working normally, and do not timeout.
It just seems to be a problem in my project.
I have tried:
Adding DSWaitResetTimeout = 10
Adding DSWaitResetStartup = 120
Adding DSWaitStartup =120
Updated Batch execution to only execute one instance of Update Jobs at one time - serial execution.
(in case of issues with the invocation)
Monitored processes generated during successful update and a failed update.
We parameterise $dsjob with a custom version of the dsjob command in our unix script to pause after the reset, and specifically ignore -wait, but adding a sleep 5 to prevent job in badstate
So reverting to the original dsjob is the current thinking, but this change has been in our production system for 3 years now and this issue has not been seen this frequently before.
this has made me investigate other less common causes, job reports:
Status code = -14 DSJE_TIMEOUT
Common_Load_DB2_J9010.CDM_DRR_EXCH_RATE_DRR_DSIF_Load_J0130_02_DS aborted
But seconds later the job is reporting succcess in Director, these lines are executed:
1) $dsjob -run -jobstatus -mode RESET -wait ${DSRep_Name} ${RUNJOBNAME}
2) $dsjob -run -mode NORMAL -jobstatus $DSJobParams $DSRep_Name $RUNJOBNAME
So its like the reset is reporting back a timeout, then execution continues and the job completes. makes me think -wait should be back in.
I wanted to ask for some background info on the quote below about the "job's status file"
Could this be an issue? - could clearing down the logs help at all ?
Can you recomend any other project setting I could try?
Thanks
I've found this entry from 2005, and it seems to be a similar issue to the one I am seeing, and would like some background info if possible.ray.wurlod wrote:You might also try clearing the job's status file and re-compiling. The status file (RT_STATUSnnn) is where the resource records are stored.
I'm seeing Status code = -14 DSJE_TIMEOUT in virtually every cycle of my batch run (Information Server 8.1.0.0 Fix Pack 1 and Roll Up Patch 1 ),
This is running a common re-usable job to update a db2 table with the contents of a dataset.
So my reusable update job is executed from a unix script to update a db2 table with a dataset, we pass required params to it and the table is updated.
I continually see the infamous -14 DSJE_TIMEOUT, but a different datastage project on the same unix server running the same job does not.
I don't belive my system is overloaded as there is a large set of jobs which have been running as normal monthly processing.
Other datastage projects with copies of this reusable job are also working normally, and do not timeout.
It just seems to be a problem in my project.
I have tried:
Adding DSWaitResetTimeout = 10
Adding DSWaitResetStartup = 120
Adding DSWaitStartup =120
Updated Batch execution to only execute one instance of Update Jobs at one time - serial execution.
(in case of issues with the invocation)
Monitored processes generated during successful update and a failed update.
We parameterise $dsjob with a custom version of the dsjob command in our unix script to pause after the reset, and specifically ignore -wait, but adding a sleep 5 to prevent job in badstate
So reverting to the original dsjob is the current thinking, but this change has been in our production system for 3 years now and this issue has not been seen this frequently before.
this has made me investigate other less common causes, job reports:
Status code = -14 DSJE_TIMEOUT
Common_Load_DB2_J9010.CDM_DRR_EXCH_RATE_DRR_DSIF_Load_J0130_02_DS aborted
But seconds later the job is reporting succcess in Director, these lines are executed:
1) $dsjob -run -jobstatus -mode RESET -wait ${DSRep_Name} ${RUNJOBNAME}
2) $dsjob -run -mode NORMAL -jobstatus $DSJobParams $DSRep_Name $RUNJOBNAME
So its like the reset is reporting back a timeout, then execution continues and the job completes. makes me think -wait should be back in.
I wanted to ask for some background info on the quote below about the "job's status file"
Could this be an issue? - could clearing down the logs help at all ?
Can you recomend any other project setting I could try?
Thanks