Green aborts

chinek · Post by **chinek** » Mon Aug 05, 2002 12:37 am

Hi ,
Ever since upgrading to Ds 5.1 ,we have been experiencing green tick aborts. The trace files would show :
$ more DSD.StageRun_65834_12635
COMO DSRTRACE_dstage-15763 established 18:17:14 04 AUG 2002
2002-08-04 18:17:14: Initialised on 18:17:14 04 AUG 2002
2002-08-04 18:17:14: DSR_MESSAGE =DataStage Job 875 Phantom 15763
DataStage Job 875 Phantom 15763
2002-08-04 18:17:14: DSR_MESSAGE =STAGE has unexpected STATUS value.
STAGE has unexpected STATUS value.

Has anyone seen this sort of problem where the stage returned an "unexpected STATUS" ? The previous run was a successful run and no problems reported. This problem affects our jobs at random, meaning different jobs will abort. Also the job will abort with green ticks with no errors reported in the log files.

Can anyone shed some light please ?

Also if you could reply to nick.chuah@txu.com.au as I might not check this forum very often.
Nick

chulett · Post by **chulett** » Mon Aug 05, 2002 8:13 am

You might want to post your system specs - server platform, o/s version, all that kind of fun stuff. You should also consider upgrading to DataStage 5.2, which is supposed to fix alot of bugs in 5.1, or so I've been told.

-craig

chinek · Post by **chinek** » Mon Aug 05, 2002 4:42 pm

Hi
Server - SUN E10000, running Solaris 8.
DB - Oracle 8.1.7
Upgrading is an option but we cannot do it asap, since we would need to go through testing and all that jazz before it can be done.
Also if the problem is due to environment then upgrading to 5.2 may not necessarily fix the problem or at best may just mask it.
What I need to know is what the cause of the problem is.

Nick

chulett · Post by **chulett** » Mon Aug 05, 2002 6:03 pm

Your best bet for getting an answer is to open a ticket with Ascential Support, which I assume you've already done. They can supply you with a patch if need be, or let you know if the problem is resolved in 5.2. About the only advise I can give you, that I have gotten in the past, is to make sure your '&PH&' directory stays pretty clean. Too many entries in there can mess up DataStage's ability to communicate between jobs. We ended up writing a batch that each night checks each project for aborted jobs and then finally clears the Phantom directory. Other than that, you are pretty much in Support's hands, I would think.

Just out of curiosity, what version did you upgrade *from*? And which 5.1 version do you have... 5.1, 5.1r1 or 5.1r2? If you are not on 5.1r2, that could be a pretty painless 'upgrade' for you.

-craig

chinek · Post by **chinek** » Tue Aug 06, 2002 4:56 am

we are running 5.1r2 , how painless is that to upgrade ?
yes i have logged the problem with ascential, but hv not heard back from them yet...
we upgraded from 3.6 to 5.1.
i do hv scripts to clean out the "&PH&" directories, it's reasonably clean but we can do more I supposed since I clean out anyting that's older than 7 days,we can decrease it to 3 days or less.
thanks for your suggestions.

Nick

Starg · Post by **Starg** » Wed Aug 07, 2002 9:06 pm

Nick,
I am running a very similar setup, Solaris 2.8, DS5.1r1 on both a E4000 and a E10000 and I experience very similar problems with jobs aborting. I'm not 100% sure if its the same problem but we find jobs sometimes (and randomly) start and then abort with no messages in the error log.

We also experience a lot of zombie processes being created and various configuration changes.. all without any luck. I was forwarded the following message from a US Support Technician from the Aus Helpdesk.

"Your customer is stuck between the proverbial rock and a hard place. Yes, keeping notify on "should" get rid of the zombies (but it isn't in his case ...interesting) ***BUT*** there is a signal handling problem on Solaris and HP where, if notify is turned on, jobs will randomly abort for no apparent reason. Have him try turning notify off to see if the jobs stop aborting. Also, have him go to Unix as root and type"

One thing I did find is that the heavier the load on the system the more frequently the problem occurs. The helpdesk could not resolve the problem so we added extra code to all our control jobs to check to see if a job aborts if so, reset it and start it again (but if it fails again really stop).

Hope that Helps.
Starg

chinek · Post by **chinek** » Wed Aug 07, 2002 9:47 pm

Hi Starg
the symptoms looks very similar, our jobs abort at random and no logs. the ascential support came back with a patch recommendation , but the patch was for Solaris 7 when we are on 8. so we put off
installing that patch until getting further clarification.

what do you mean "when notify is turned on" ?

i find that with 5.1 , we get zombie processes too from jobs that have been aborted in DS but one zombie process still remains in the OS. not sure why it's not cleaning up properly.

for some of the jobs i can do an automatic reset , but for other jobs that have dependencies it's not that simple.

thanks for your suggestions though.

at least i am not the only one out there with this problem...

Nick