Page 1 of 1
Sporadic Problems with DSDetachJob() call
Posted: Tue Jul 19, 2005 1:41 am
by ArndW
We have a complex mechanism to control job starts and logging here, essentially a wrapper around each call from one sequence to another or to a job. I am getting odd behaviour now, which I think is related to the number of instances (up to 200 for some jobs) that we have.
Part of the mechanism opens up the instances to find out which of them is the caller's parent - this needs to be done with multi-instance jobs due to some internal DS restrictions. I get a string of all 150+ instances and loop through each one of them:
Code: Select all
FOR Index = 1 to NumberOfInstances
JobHandle = DSAttachJob(JobList<Index>,DSJ.ERRFATAL)
{processing}
Dummy = DSDetachJob(JobHandle)
NEXT Index
and then I continue processing. Later on I have another DSAttachJob() which I use to fill the parameters and do a DSRunJob(), then issue a DSDetachJob() which is then causing a FATAL error:
JdDSSJOBCheckJobControl..AfterJob (fatal error from DSDetachJob): Job
control fatal error (-1)
(DSDetachJob) Invalid job handle 4
This is quite sporadic. I am also being told that occasionally this DSRunJob() is executing a different file handle than it should. So something is definately getting mixed up with the job handles. I am awaiting information from support, but they are not being forthcoming [I traced the message using VLIST but need them to decode a COMMON variable name for me]. The documentation states that DSDetachJob() should only return an error when trying to close DSJ.ME, but in this case it is aborting the jobs!
Has anyone seen anything like this before or have any suggestions?
Posted: Tue Jul 19, 2005 1:59 am
by Sainath.Srinivasan
Were any other person changing - e.g. compiling - the job when you are trying to execute your script?
Did you change the DS configuration for files and locks? - if they run fine for low number and fail at certain level.
Posted: Tue Jul 19, 2005 2:06 am
by ArndW
Nothing has changed in the configuration - this occurs on 2 different machines as well. Nobody is touching these programs. I've written some test jobs to try to reproduce the issue consistently to no avail, although I did find out a strange issue - there is an internal common block which contains the job handle information. One of the variables shows the number of file units used and that keeps on getting incremented, even after I do DSDetach() calls.
Posted: Tue Jul 19, 2005 5:39 am
by ray.wurlod
Could be a timing issue. Try a short SLEEP or NAP after DSDetachJob().
The return value from DSDetachJob() is its status. Check to see whether its value is DSJE.NOERROR or something else. If something else, screech to a halt and figure out what's happening.
Posted: Tue Jul 19, 2005 5:46 am
by ArndW
Ray - I actually created a common DSDetachJob interlude so that I could play around with debugging information and even commented out the statement, the error is then evinced elsewhere! It seems that somehow the internal common variables that hold the open and used JobHandles are getting very mixed up (calling incorrect jobs); I wish I knew where this is happening! It seems to go away when I remove instance names of multiinstance jobs, but comes back when a lot of instances exist. Part of the code actually retrieves all instances of a given jobs and loops through all of them (trying to find a job's "real" parent/calling process) with DSAttachJob() and DSDetachJob() inside the loop.
I have noticed that the internal variable that counts the number of active JobHandles per session never goes down, i.e. if you open up 10 jobs it goes up to 11, if you close them the number remains the same and the detail array for them remains filled with information, as if the DSDetach job actually does nothing. I wish I could look at the source code for DSDetach or that I still had my decompiler!
Anyway, no matter what I am doing wrong the DSDetach should still not cause a job abort.
I've escalated this issue to support but haven't heard back from them yet.
Posted: Tue Jul 19, 2005 5:48 am
by ArndW
Ray - I actually created a common DSDetachJob interlude so that I could play around with debugging information and even commented out the statement, the error is then evinced elsewhere! It seems that somehow the internal common variables that hold the open and used JobHandles are getting very mixed up (calling incorrect jobs); I wish I knew where this is happening! It seems to go away when I remove instance names of multiinstance jobs, but comes back when a lot of instances exist. Part of the code actually retrieves all instances of a given jobs and loops through all of them (trying to find a job's "real" parent/calling process) with DSAttachJob() and DSDetachJob() inside the loop.
I have noticed that the internal variable that counts the number of active JobHandles per session never goes down, i.e. if you open up 10 jobs it goes up to 11, if you close them the number remains the same and the detail array for them remains filled with information, as if the DSDetach job actually does nothing. I wish I could look at the source code for DSDetach or that I still had my decompiler!
Anyway, no matter what I am doing wrong the DSDetach should still not cause a job abort.
I've escalated this issue to support but haven't heard back from them yet.
Posted: Tue Jul 19, 2005 5:59 am
by ray.wurlod
Is your code based on the way that job sequences do it? They seem to work OK - could be worth inspecting their code!
Posted: Tue Jul 19, 2005 6:05 am
by ArndW
Hmm... I think I am using the same method that they are in order to find out who the parent process is; but it still involves iterating through the instances until you get a match.
I'll post the causes and (hopefully) the solution when I get it.
Posted: Tue Jul 19, 2005 9:37 am
by kduke
There are other problems with multiple instance jobs which existed in version 6. I have not tried them in version 7 so I do not know if they still exist. You could not stop multiple instance jobs in BASIC. Someone said there is a patch to fix it but I would bet that these are related.
Posted: Thu Jul 21, 2005 5:02 am
by ArndW
Well, I've narrowed down part of the problem:
If you do a:
Code: Select all
JobHandle1 = DSAttachJob('MyJob',DSJ.ERRFATAL)
JobHandle2 = DSAttachJob('MyJob',DSJ.ERRFATAL)
Dummy = DSDetachJob(JobHandle1)
CALL DSLogInfo(DSGetJobInfo(JobHandle2,'')
it will fail. This means that when you do a DetachJob it will automatically detach any and all other job handles associated with that job.
Still have issues, but this was the most irksome to find. In my case the main program opened the handle, then in a subroutine (called by another subroutine) this job was opened to get it's last run status and then closed. That took a long time to find...