The infamous -14 DSJE_TIMEOUT issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DDR_dev
Premium Member
Premium Member
Posts: 1
Joined: Thu Jul 07, 2011 8:19 am
Location: UK

The infamous -14 DSJE_TIMEOUT issue

Post by DDR_dev »

Hi, I'm new member on this site, just wanted to say how useful this forum is.
ray.wurlod wrote:You might also try clearing the job's status file and re-compiling. The status file (RT_STATUSnnn) is where the resource records are stored.
I've found this entry from 2005, and it seems to be a similar issue to the one I am seeing, and would like some background info if possible.

I'm seeing Status code = -14 DSJE_TIMEOUT in virtually every cycle of my batch run (Information Server 8.1.0.0 Fix Pack 1 and Roll Up Patch 1 ),

This is running a common re-usable job to update a db2 table with the contents of a dataset.

So my reusable update job is executed from a unix script to update a db2 table with a dataset, we pass required params to it and the table is updated.

I continually see the infamous -14 DSJE_TIMEOUT, but a different datastage project on the same unix server running the same job does not.

I don't belive my system is overloaded as there is a large set of jobs which have been running as normal monthly processing.

Other datastage projects with copies of this reusable job are also working normally, and do not timeout.

It just seems to be a problem in my project.

I have tried:

Adding DSWaitResetTimeout = 10
Adding DSWaitResetStartup = 120
Adding DSWaitStartup =120

Updated Batch execution to only execute one instance of Update Jobs at one time - serial execution.
(in case of issues with the invocation)

Monitored processes generated during successful update and a failed update.

We parameterise $dsjob with a custom version of the dsjob command in our unix script to pause after the reset, and specifically ignore -wait, but adding a sleep 5 to prevent job in badstate

So reverting to the original dsjob is the current thinking, but this change has been in our production system for 3 years now and this issue has not been seen this frequently before.

this has made me investigate other less common causes, job reports:

Status code = -14 DSJE_TIMEOUT
Common_Load_DB2_J9010.CDM_DRR_EXCH_RATE_DRR_DSIF_Load_J0130_02_DS aborted

But seconds later the job is reporting succcess in Director, these lines are executed:

1) $dsjob -run -jobstatus -mode RESET -wait ${DSRep_Name} ${RUNJOBNAME}

2) $dsjob -run -mode NORMAL -jobstatus $DSJobParams $DSRep_Name $RUNJOBNAME

So its like the reset is reporting back a timeout, then execution continues and the job completes. makes me think -wait should be back in.

I wanted to ask for some background info on the quote below about the "job's status file"

Could this be an issue? - could clearing down the logs help at all ?

Can you recomend any other project setting I could try?

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just wanted to let you know I split this out into your own topic - so you have control over its fate - and linked back to the 2005 topic for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply