Page 1 of 1

Job Aborted: Resource temporarily unavailable

Posted: Mon Jan 30, 2006 5:52 am
by apraman
Hi,
I have parallel job. While run time the job get aborted. Some of the fatal error messages are -

Code: Select all


main_program: APT_PMConnectionRecord::rsh: fork failed, Resource temporarily unavailable

main_program: APT_PM_LocalShell: fork failed, errno = 11

main_program: The section leader on gifbdc died


-----------------------------------------------------

node_node1: Fatal Error: Unable to start ORCHESTRATE process on node node1 (gifbdc): APT_PMPlayer::APT_PMPlayer: fork() failed, Resource temporarily unavailable
How to overcome this situation?

Posted: Mon Jan 30, 2006 6:07 am
by ArndW
Arun,

the operating system is trying to execute a fork() and can't. The fork() call is used to spawn new processes from a parent {it actually duplicates a processes' complete memory space and starts a duplicate process}.

If this error happens every time you run this job then most likely you have hit a [soft] limitation on the number of concurrent processes that one userid may have (look up "maxuprc" for your version of UNIX). You could have your system administrator raise that limit. All see your manpages for nproc or max_nprocs .

It might be that your system is being overloaded with requests; UNIX is usually very good about fork()ing; so it would have to be very busy for this to be the cause.

Posted: Mon Jan 30, 2006 10:52 pm
by kumar_s
HI,

As Arnd suggested you need to check the number of Concurrent process, and tune the unix settings.
Also check the number of jobs executed parallely by the time of the error. Its obviuos that the number of jobs increased the number of process. If this is the case, you can restrice the number of jobs execution easily rather than controling the umber of processes.

-Kumar