Page 1 of 1

Strange Server errors

Posted: Sun Aug 17, 2008 9:16 pm
by chulett
Have something odd going on, with errors and situations going on I've never seen before. These are issues with old jobs on a secondary QA server, jobs that run fine in 3 other environments but haven't been run on this particular box for about a month. And no, I'm not aware of anything that may have changed but I'm gonna try and find out. Keep in mind as you read this there is no row buffering enabled in any of the jobs nor the project defaults.

Now, as to the problems. First inkling of problems was when a QA person contacted me with an error. We have an MI job that runs 16 instances simultaneously and one had failed with an error message I haven't personally seen or recall being posted:

Code: Select all

JobName.Invocation.Xform  Unable to create new process. Will try again.

This in a Fatal log message and needless to say the job aborted. The other 15 were allegedly running. When I checked, they had been running for 12 hours and their monitors looked something like this:

Image

This image is actually from later in the trainwreck but the situation is the same - first xform still "starting" with all the rest "running". I killed all of these jobs and recycled the DataStage Server. Next time the 16 were cranked up, the first 8 actually started and the second 8 never got all of the transformers running, looking like the image linked above. Eventually, the first 8 completed but didn't seem to realize it:

Image

All but the first xform finished. It *is* finished but hasn't set the status yet. I went in to start killing processes and nuked the PIDs related to the first invocation. At that time, instance 9 and 10 aborted and 11 through 16 actually started to process rows. I restarted 9 and 10 so that the last 8 invocations are now running.

I have zero confidence that they will all finish like normal and fully expect them to get 'stuck' as well. I know this is alot of rambling but wondering what peoples thoughts are. While typing this up decided to check the &PH& directory and found basically 1000 files there. Will clear it of all extraneous files and see if that helps.

Thanks.

Posted: Sun Aug 17, 2008 9:26 pm
by ray.wurlod
Is the HP-UX process table filling/full, preventing allocation of any more process IDs (pids)?

Posted: Sun Aug 17, 2008 9:28 pm
by chulett
Not sure the PHantom directory was the culprit here as, in checking and clearing other projects, I'm finding ones with multiple thousands of entries, including the Big Winner with 14040. We'll see.

Posted: Sun Aug 17, 2008 9:31 pm
by chulett
ray.wurlod wrote:Is the HP-UX process table filling/full, preventing allocation of any more process IDs (pids)?

I don't believe so, but I'll have the SAs check the logs tomorrow. The server was pretty quiet other than these jobs as it allegedly dedicated to the QA group. From what I recall, it was running about 75% idle. I know that doesn't answer your question but there really wasn't all that much going on.

I have a funny feeling something changed here... like it may have been an organ donor for other servers as it has far less RAM than I remember it having. More questions for the SAs. :?

Posted: Mon Aug 18, 2008 2:05 am
by ArndW
Could it be that you have reached the MAXUPROC value for that user-id?

Posted: Mon Aug 18, 2008 7:11 am
by chulett
Could be... something else to check on with my elusive SA friends.