-
Notifications
You must be signed in to change notification settings - Fork 3
Running a Compute Job
Brain supports two types of computing jobs: batch and interactive. What type of job to use, and when to use it will depend on your needs. For example, if you are performing exploratory data analysis, you will want an interactive job. On the other hand, if you have long running task that once started, will run without user input and eventually output a result, you should use a batch job.
Ideally, you should aim to work towards converting as many of your tasks to batch jobs. With batch jobs, you help maximize the cluster resource utilization, by allowing the job scheduler to start your task as soon resources are available. In other words, you don't have to be around to start / stop the job, the scheduler will do this automatically, with the goal of getting every job done as soon as it can.
Queue
A queue represents a resource allocation. Each group that purchased Brain nodes is provided with a queue allowing them access to resources equal to their purchase.
PE (Parallel Environment)
There two types of parallel environments on Brain: MPI and SMP.
- mpi: Jobs utilizing MPI (typically distributed across nodes)
- smp: Jobs utilizing 1 or more CPU cores on a single node
Slot
A slot is a fraction of a compute node. From a CPU standpoint, a slot represents 1 virtual CPU core, or hyperthread.
JOBID
The resource scheduler assigns a unique ID to every job. You can use the ID number check on job status, terminate the job, etc.
IMPORTANT: When scheduling a compute job, you must specify the amount CPU/Memory your job will require. If your job exceeds those limits, the scheduler may terminate your job without warning.
Each cluster node is configured with approximately 180GB of high speed scratch space. The scratch space is local to the node itself. For maximum performance, your should job should copy any required data to the local scratch disk during startup. Then as the job processes, it should output to that local scratch disk. Once the job completes, your job should be configured to copy the data from scratch to your home folder. After the copy completes, be sure to cleanup (remove) any files you have on scratch.
Scratch is conveniently located at /scratch.
MPI or Message Passing Interface is used by some parallelized software packages. It is most commonly utilized when running a job that will span multiple compute nodes. If you are using the non-default MPI package, you may need to make some adjustments to command-line arguments, and environment variables. From our testing, the following were needed:
-
Disable the loading of default environment modules. Edit your ~/.bashrc file, and add:
export ROCKS_USER_MODULE_DEF=True
-
Add the '--prefix /path/to/mpi' to your mpirun command. For example:
mpirun --prefix /share/apps/OpenMPI/openmpi-4.0.1 -hostfile host_file ./mpi-hello-world/code/mpi_hello_world
NOTE: For long running (several days or weeks) batch jobs, your job should be designed to save checkpoints that it can resume from. That way, in the event the job is terminated due resource usage, node crash, etc. you don't lose all of your work.
A batch job, starts with a job file that defines where (what directory) the job should run, the job queue to use, and the amount of resources it requires. A simple example of a script, we'll call test.sh is below:
#!/bin/bash
#
#$ -cwd
#$ -q YOURQUEUE.q
#$ -pe PE SLOTS
#$ -S /bin/bash
date
sleep 60
date
Once the job script has been created, you're ready to submit it to the scheduler. The first step, is to switch your Unix primary group, to the workgroup this belongs to (you must already be a member of the group). For example, if you are part of mesa, and working on a project for mesa, you would switch to that primary group.
newgrp mesa
Now submit the job to the scheduler:
qsub test.sh
Your job will be executed as soon as there is available cluster capacity. For most jobs, this is likely to be immediate. However, if you submit several jobs within a short period of time, your total resource demand may exceed your queue limit. In which case, some jobs will be delayed while they wait for resources.
Just like a batch compute job, you'll need to first switch your Unix primary group, to the group your job is for. For example, if you are working for mesa, you would choose the mesa group.
newgrp mesa
Requesting an interactive session on a compute node:
qlogin -q YOURQUEUE.q -pe PE SLOTS
Once your request is submitted, it should be processed in seconds. If there is insufficient available capacity, the resource scheduler will deny your request.
Running an interactive job via X2Go is almost identical to running one via SSH. The difference is where you run the qlogin command. In an X2Go session, you'll launch xterm (or some other terminal emulator), and then run the qlogin command. At that point, the terminal will contain a session on one of the compute nodes. If you open any other terminal windows, those terminals will not be connected to the compute node.
List jobs you have running:
qstat
Terminating a running job:
qdel JOBID
Get detailed on a job, such as why it may have failed:
qstat -j JOBID