Skip to content

Resources

John Yocum edited this page Mar 23, 2020 · 16 revisions

Hostgroups

As the name implies, a hostgroup is a collection of cluster nodes in Grid Engine. Hostgroups can be used to group together hosts for any reason, such as hardware capabilities, or system sponsor. In our case, we're using them to divide systems up based on their sponsor (owner).

List Hostgroups

qconf -shgrpl

Show Hostgroup

qconf -shgrp @mesa

Create Hostgroup

Using the qconf utility, create a new hostgroup named "@mesa". The utility will then open the default EDITOR, presenting a config file.

qconf -ahgrp @mesa

Example of a hostgroup configuration:

group_name @mesa
hostlist compute-0-0.brain.local compute-0-1.brain.local compute-0-2.brain.local

Modify Hostgroup

qconf -mhgrp @mesa

Delete Hostgroup

qconf -dhgrp @test

Queues

A queue defines the resource limits that a job may request, and on which nodes the job will be executed. In our case, we'll create a queue for each cluster sponsor, that will then be associated with the hostgroup containing their nodes.

List Queues

qconf -sql

Show Queue Configuration

qconf -sq all.q

Creating a Queue

  1. Export the all.q queue, to use as a base for the new queue

     qconf -sq all.q > mesa.q
    
  2. Edit mesa.q using an editor of your choice. In most cases, you'll only need to modify qname, hostlist, users_list, tmpdir (/scratch) and slots.

  3. Load the new configuration

     qconf -Aq mesa.q
    

Modify Queue

qconf -mq mesa.q

Disable Queue

qmod -d all.q

Enable Queue

qmod -e all.q

Monitoring

Check Usage by Queue

qstat -g c

Check Usage By Queue and Host

qstat -f

Show All Jobs

qstat -u '*'

View Historical CPU, RAM, and Network Usage

The cluster has a Ganglia instance which tracks historical resource utilization on a per-node basis. Ganglia can be accessed via a browser session on the frontend node, at http://127.0.0.1/ganglia/.