Documentation/Jobs on the wait queue

From systems
Jump to: navigation, search
NOTE: This page is out of date. Information only applies to Gaia.

Question:

I have some questions about jobs in queue on the cluster. I submitted multiple jobs and found more than half of them are in queue (qw status). I also used the command "qstat -f" to check the cluster. I found there were actually several nodes they might still have capacity to execute jobs. Why?

Answer:

There are a number of reasons that a job might be left in a wait queue, including (but not limited to):

- User has exceed the maximum job limit (presently 25/user on adcluster, though this may change and 100 on gaia)

- There are not enough resources available on the node (e.g. even though only 2/4 slots are full on the node, the CPU load is 3.9 or the memory is near maximum usage)

- The user has requested unrealistic resources (say, the user requests 20G of memory, but no machines in the cluster actually have that much memory)


You can always see why a job has or hasn't been queued by running: $ qstat -j <job_id>