Jobs getting into a wait state for a long time...

VCInDSX · Post by **VCInDSX** » Wed May 28, 2008 7:27 am

Hi Group,
In our Linux GRID environment we are often running into a situation where jobs end up waiting for a longggg time to be allocated a node.

The Head Node has 3 computing nodes at its service and all our jobs are using the default GRID configuration parameter values as follows
$APT_GRID_ENABLE=YES
$APT_GRID_COMPUTENODES=1
$APT_GRID_PARTITIONS=1
$APT_GRID_SEQFILE_HOST=

In a particular scenario, 3 jobs are executing in the 3 computing nodes for user1 in project1.
If user1 submits another job in the same project, that job is entering into "Waiting for release from queue" stage until one of the computing nodes becomes free.

Thinking that the user1 is limited to one job per computing node, I tried submitting another job for user2 from project2 and that is also in a waiting stage.

Is this typical and should all remaining users wait until the node(s) become free? OR is it due to the way the servers have been configured?

Should I review the values that i specify for the GRID parameters?

I wonder if that would really help because, increasing the nodes or partitions will only be helpful if the job scheduler is able to allocate a node in the first place, correct?

I am not conversant with the server's configuration/setup in detail... so pardon any obvious ones here.

Thanks in advance for your time and inputs...

bkumar103 · Post by **bkumar103** » Wed May 28, 2008 8:05 am

try with $APT_GRID_COMPUTENODES=2
$APT_GRID_PARTITIONS=2

Test if it can help

lstsaur · Post by **lstsaur** » Wed May 28, 2008 11:31 am

Talk to the person who configures the grid environment for you. What Resource Manager software is being used, SGE, PBS Pro, or LSF for grid?
It's definitely your queue is not responding correctly. Increasing compute nodes and partitions will not do any help.

VCInDSX · Post by **VCInDSX** » Wed May 28, 2008 12:00 pm

Hi Birendra and lstsaur,
Thanks for your inputs. I tend to agree with lstsaur as the increase in nodes with the current setup of our system will still keep the new jobs in the queue.

The feedback from our "person" who is managing the queue is that only one job can run in a node. If all the nodes are busy with one job each, incoming jobs have to wait until the node becomes free. I am not sure if that is how the GRID would work.

One additional question that I have is....
If the 3 computing nodes are multi-cpu (actually 4 cpu) boxes, would this single job that is assigned to that node use all the 4 CPU power or should the scheduler be smart enough to have more than one job on that node, but exploit the 4 cpu power?

As for the Grid engine.... SGE is what I can see in the logs...
APT_GRID_ENGINE=SGE
/nfsgrid/nfsbin/sge/bin/lx24-x86:

Let me know if any other information from my side will help you help me..

Thanks again for your time

lstsaur · Post by **lstsaur** » Wed May 28, 2008 1:56 pm

Just change your $APT_GRID_PARTITIONS=4, then the OS will handle up to 4 partitions on that compute node (based on complexity of job flow) for you.
You still didn't tell me what Resource Mangers software that is used in your gird environment. Because with this tool, you should be able to see every step of your job's resource utilizations if that "person" gives you the access permissions.

VCInDSX · Post by **VCInDSX** » Wed May 28, 2008 9:46 pm

Thanks lstsaur. I will definitely try the increased partition value and post the results.

Meantime, SGE is the resource manager tool in this shop.

Let me know if there are additional steps to find the resource allocation steps.

Right now, i can see which computing node got allocated for a particular job from the job log.

Thanks again

VCInDSX · Post by **VCInDSX** » Thu May 29, 2008 7:49 am

Hi lstsaur,
I have some good news from my first tests. Increasing the partitions per node to 4 helped a good number of jobs to complete faster.
Here is the config file that i pulled from the director log for this execution.

Code: Select all

<Dynamic_gird.sh> SEQFILE Host(s): ctpcqabdsc02p: ctpcqabdsc02p:
{
         node "Conductor"
 {
  fastname "ctpcqabdsh01p"
  pools "conductor"
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_1"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_2"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_3"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_4"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
}

Is this configuration good enough or is there anything that will help this cause?

I haven't looked at the Database side, yet. The target is an SQL Server 2005 database that is not partitioned. But each of these jobs are loading into separate tables and I believe that is atleast some relief.

A few additional questions that come to my mind....

As the computing nodes are 4-CPU boxes, is there a guideline on MAX partitions that will utilize the power of the server without any overkill?

Also, if I were to run this with a different config file that will allow multiple nodes would that internally get queued at the Host?

If I were to adjust $APT_GRID_COMPUTENODES to a value greater than 1, would it have the same effect as a multi-node config file?

If you could help me understand this, it will be of great help or if you could point me to documentation that will help me understand these, it will be much appreciated.

Thanks again for your invaluable time and help