Skip to content

Running a Compute Job

John Yocum edited this page Apr 18, 2019 · 23 revisions

Brain supports two types of computing jobs: batch and interactive. What type of job to use, and when to use it will depend on your needs. For example, if you are performing exploratory data analysis, you will want an interactive job. On the other hand, if you have long running task that once started, will run without user input and eventually output a result, you should use a batch job.

Ideally, you should aim to work towards converting as many of your tasks to batch jobs. With batch jobs, you help maximize the cluster resource utilization, by allowing the job scheduler to start your task as soon resources are available. In other words, you don't have to be around to start / stop the job, the scheduler will do this automatically, with the goal of getting every job done as soon as it can.

Scratch Space

Each cluster node is configured with approximately 180GB of high speed scratch space. The scratch space is local to the node itself. For maximum performance, your should job should copy any required data to the local scratch disk during startup. Then as the job processes, it should output to that local scratch disk. Once the job completes, your job should be configured to copy the data from scratch to your home folder. After the copy completes, be sure to cleanup (remove) any files you have on scratch.

Scratch is conveniently located at /scratch.

Scheduling a Job

NOTE: When scheduling a compute job, you must specify the amount CPU/Memory your job will require. If your job exceeds those limits, the scheduler may terminate your job without warning.

Batch

NOTE: For long running (several days or weeks) batch jobs, your job should be designed to save checkpoints that it can resume from. That way, in the event the job is terminated due resource usage, node crash, etc. you don't lose all of your work.

Interactive

qlogin -q YOURQUEUE.q -pe SLOTS