Sporadic Problems with DSDetachJob() call

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Sporadic Problems with DSDetachJob() call

Post by ArndW »

We have a complex mechanism to control job starts and logging here, essentially a wrapper around each call from one sequence to another or to a job. I am getting odd behaviour now, which I think is related to the number of instances (up to 200 for some jobs) that we have.

Part of the mechanism opens up the instances to find out which of them is the caller's parent - this needs to be done with multi-instance jobs due to some internal DS restrictions. I get a string of all 150+ instances and loop through each one of them:

Code: Select all

FOR Index = 1 to NumberOfInstances
   JobHandle = DSAttachJob(JobList<Index>,DSJ.ERRFATAL)
   {processing}
   Dummy    = DSDetachJob(JobHandle)
NEXT Index


and then I continue processing. Later on I have another DSAttachJob() which I use to fill the parameters and do a DSRunJob(), then issue a DSDetachJob() which is then causing a FATAL error:

JdDSSJOBCheckJobControl..AfterJob (fatal error from DSDetachJob): Job
control fatal error (-1)
(DSDetachJob) Invalid job handle 4


This is quite sporadic. I am also being told that occasionally this DSRunJob() is executing a different file handle than it should. So something is definately getting mixed up with the job handles. I am awaiting information from support, but they are not being forthcoming [I traced the message using VLIST but need them to decode a COMMON variable name for me]. The documentation states that DSDetachJob() should only return an error when trying to close DSJ.ME, but in this case it is aborting the jobs!

Has anyone seen anything like this before or have any suggestions?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Were any other person changing - e.g. compiling - the job when you are trying to execute your script?

Did you change the DS configuration for files and locks? - if they run fine for low number and fail at certain level.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Nothing has changed in the configuration - this occurs on 2 different machines as well. Nobody is touching these programs. I've written some test jobs to try to reproduce the issue consistently to no avail, although I did find out a strange issue - there is an internal common block which contains the job handle information. One of the variables shows the number of file units used and that keeps on getting incremented, even after I do DSDetach() calls.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Could be a timing issue. Try a short SLEEP or NAP after DSDetachJob().

The return value from DSDetachJob() is its status. Check to see whether its value is DSJE.NOERROR or something else. If something else, screech to a halt and figure out what's happening.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray - I actually created a common DSDetachJob interlude so that I could play around with debugging information and even commented out the statement, the error is then evinced elsewhere! It seems that somehow the internal common variables that hold the open and used JobHandles are getting very mixed up (calling incorrect jobs); I wish I knew where this is happening! It seems to go away when I remove instance names of multiinstance jobs, but comes back when a lot of instances exist. Part of the code actually retrieves all instances of a given jobs and loops through all of them (trying to find a job's "real" parent/calling process) with DSAttachJob() and DSDetachJob() inside the loop.

I have noticed that the internal variable that counts the number of active JobHandles per session never goes down, i.e. if you open up 10 jobs it goes up to 11, if you close them the number remains the same and the detail array for them remains filled with information, as if the DSDetach job actually does nothing. I wish I could look at the source code for DSDetach or that I still had my decompiler!

Anyway, no matter what I am doing wrong the DSDetach should still not cause a job abort.

I've escalated this issue to support but haven't heard back from them yet.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray - I actually created a common DSDetachJob interlude so that I could play around with debugging information and even commented out the statement, the error is then evinced elsewhere! It seems that somehow the internal common variables that hold the open and used JobHandles are getting very mixed up (calling incorrect jobs); I wish I knew where this is happening! It seems to go away when I remove instance names of multiinstance jobs, but comes back when a lot of instances exist. Part of the code actually retrieves all instances of a given jobs and loops through all of them (trying to find a job's "real" parent/calling process) with DSAttachJob() and DSDetachJob() inside the loop.

I have noticed that the internal variable that counts the number of active JobHandles per session never goes down, i.e. if you open up 10 jobs it goes up to 11, if you close them the number remains the same and the detail array for them remains filled with information, as if the DSDetach job actually does nothing. I wish I could look at the source code for DSDetach or that I still had my decompiler!

Anyway, no matter what I am doing wrong the DSDetach should still not cause a job abort.

I've escalated this issue to support but haven't heard back from them yet.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is your code based on the way that job sequences do it? They seem to work OK - could be worth inspecting their code!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hmm... I think I am using the same method that they are in order to find out who the parent process is; but it still involves iterating through the instances until you get a match.

I'll post the causes and (hopefully) the solution when I get it.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

There are other problems with multiple instance jobs which existed in version 6. I have not tried them in version 7 so I do not know if they still exist. You could not stop multiple instance jobs in BASIC. Someone said there is a patch to fix it but I would bet that these are related.
Mamu Kim
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Well, I've narrowed down part of the problem:

If you do a:

Code: Select all

JobHandle1 = DSAttachJob('MyJob',DSJ.ERRFATAL)
JobHandle2 = DSAttachJob('MyJob',DSJ.ERRFATAL)
Dummy      = DSDetachJob(JobHandle1)
CALL DSLogInfo(DSGetJobInfo(JobHandle2,'')


it will fail. This means that when you do a DetachJob it will automatically detach any and all other job handles associated with that job.

Still have issues, but this was the most irksome to find. In my case the main program opened the handle, then in a subroutine (called by another subroutine) this job was opened to get it's last run status and then closed. That took a long time to find...
Post Reply