Skip to content

Commit

Permalink
Rework the user interface.
Browse files Browse the repository at this point in the history
Use subcommands instead of various scripts, and allow to install lobster
locally or system wide.  Also try to make the language in the help
consistent.
  • Loading branch information
matz-e committed Apr 26, 2014
1 parent 4a5e232 commit 831fb6c
Show file tree
Hide file tree
Showing 10 changed files with 293 additions and 184 deletions.
101 changes: 79 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,38 @@
# Dependencies
# Installation

[PyYaml](http://pyyaml.org/wiki/PyYAML) is a pre-requisite. Install it
locally after executing `cmsenv` in a release which is suppossed to be used
with lobster:
## Dependencies

cd /tmp
wget -O - http://pyyaml.org/download/pyyaml/PyYAML-3.10.tar.gz|tar xzf -
cd PyYAML-3.10/
python setup.py install --user
cd ..
rm -rf PyYAML-3.10/
### CClab tools

See [instructions on github](https://github.com/cooperative-computing-lab/cctools)
of the Notre Dame Cooperative Computing Lab to obtain versions of
`parrot` and `work_queue`.

### Setuptools

Install the python `setuptools`, if not already present, with

wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python - --user

*At ND*

wget --no-check-certificate https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python - --user

Now `lobster` can be installed, and any further python dependencies will be
installed into your `~/.local` directory.

## Setup

Install the `argparse` module, if not available for your python (normally
2.7 and up):
Install lobster itself with

cd /tmp
wget -O - http://argparse.googlecode.com/files/argparse-1.2.1.tar.gz|tar xzf -
cd argparse-1.2.1/
git clone [email protected]:matz-e/lobster.git
cd lobster
python setup.py install --user
cd ..
rm -rf argparse-1.2.1/

# Setting up your environment
and `lobster` will be installed as `~/.local/bin/lobster`. Add it to your
path with

export PYTHONPATH=$PYTHONPATH:/afs/nd.edu/user37/ccl/software/cctools-lobster/lib/python2.7/site-packages/
export PATH=/afs/nd.edu/user37/ccl/software/cctools-lobster/bin:$PATH
export PATH=$PATH:$HOME/.local/bin

# Running lobster

Expand All @@ -43,10 +51,18 @@ CMS):

4. Running lobster

./lobster.py test/beanprod.yaml
lobster process test/beanprod.yaml

5. Starting workers --- see below.

6. Creating summary plots

lobster plot --outdir <your output directory> <your config/working directory>

7. Publishing

lobster publish <labels> <your config/working directory>

# Submitting workers

To start 10 workers, 4 cores each, connecing to a lobster instance with id
Expand All @@ -60,10 +76,51 @@ If the workers get evicted by condor, the memory and disk settings might need
adjustment. Check in them`lobster.py` for minimum settings (currently 1100 Mb for
memory, 4 Gb for disk).

# Monitoring at ND
# Running at ND

## Setting up your environment

Use `work_queue` etc from the CC lab:

export PYTHONPATH=$PYTHONPATH:/afs/nd.edu/user37/ccl/software/cctools-lobster/lib/python2.7/site-packages/
export PATH=/afs/nd.edu/user37/ccl/software/cctools-lobster/bin:$PATH

or, for `tcsh` users,

setenv PYTHONPATH ${PYTHONPATH}:/afs/nd.edu/user37/ccl/software/cctools-lobster/lib/python2.7/site-packages/
setenv PATH /afs/nd.edu/user37/ccl/software/cctools-lobster/bin:${PATH}

## Running opportunistically

The CRC login nodes `opteron`, `newcell`, and `crcfe01` are connected to
the ND opportunistic computing pool. On these, multicore jobs are
preferred and can be run with

cores=4
condor_submit_workers -N lobster_<your_id> --cores $cores \
--memory $(($cores * 1100)) --disk $(($cores * 4500)) 10

or, for `tcsh` users,

set cores=4
condor_submit_workers -N lobster_<your_id> --cores $cores \
--memory `dc -e "$cores 1100 *p"` --disk `dc -e "$cores 4500 *p"` 10

## Running locally

To submit 10 workers (= 10 cores) to the T3 at ND, run

condor_submit_workers -N lobster_<your_id> --cores 1 \
--memory 1000 --disk 4500 10

on `earth`.

## Monitoring

* [CMS dasboard](http://dashb-cms-job.cern.ch/dashboard/templates/web-job2/)
* [CMS squid statistics](http://wlcg-squid-monitor.cern.ch/snmpstats/indexcms.html)
* [Condor usage](http://condor.cse.nd.edu/condor_matrix.cgi)
* [NDCMS trends](http://mon.crc.nd.edu/xymon-cgi/svcstatus.sh?HOST=ndcms.crc.nd.edu&SERVICE=trends&backdays=0&backhours=6&backmins=0&backsecs=0&Go=Update&FROMTIME=&TOTIME=)
to monitor squid bandwidth
* [External bandwidth](http://prtg1.nm.nd.edu/sensor.htm?listid=491&timeout=60&id=505&position=0)
* `work_queue_status` on the command line
114 changes: 2 additions & 112 deletions lobster.py
Original file line number Diff line number Diff line change
@@ -1,115 +1,5 @@
#!/usr/bin/env python
import os
import shutil
import time
import yaml
from lobster import cmssw
from lobster import job
from argparse import ArgumentParser

import work_queue as wq
from lobster.ui import boil

parser = ArgumentParser(description='A job submission tool for CMS')
parser.add_argument('config_file_name', nargs='?', default='test/lobster.yaml', help='Configuration file to process.')
parser.add_argument('--bijective', '-i', action='store_true', default=False, help='Use a 1-1 mapping for input and output files (process one input file per output file).')
args = parser.parse_args()

with open(args.config_file_name) as config_file:
config = yaml.load(config_file)

config['filepath'] = args.config_file_name
if not 'cmssw' in repr(config):
job_src = job.SimpleJobProvider(config)
else:
job_src = cmssw.JobProvider(config)
from ProdCommon.Credential.CredentialAPI import CredentialAPI
cred = CredentialAPI({'credential': 'Proxy'})
if not cred.checkCredential(Time=60):
cred.ManualRenewCredential()

wq.cctools_debug_flags_set("all")
wq.cctools_debug_config_file(os.path.join(config["workdir"], "debug.log"))
wq.cctools_debug_config_file_size(1 << 29)

queue = wq.WorkQueue(-1)
queue.specify_log(os.path.join(config["workdir"], "work_queue.log"))
queue.specify_name("lobster_" + config["id"])
queue.specify_keepalive_timeout(300)
# queue.tune("short-timeout", 600)

print "Starting queue as", queue.name
print "Submit workers with: condor_submit_workers -N", queue.name, "<num>"

payload = 400

while not job_src.done():
stats = queue.stats

print "Status: Slaves {0}/{1} - Jobs {3}/{4}/{5} - Work {2} [{6}]".format(
stats.workers_busy,
stats.workers_busy + stats.workers_ready,
job_src.work_left(),
stats.tasks_waiting,
stats.tasks_running,
stats.tasks_complete,
time.strftime("%d %b %Y %H:%M:%S", time.localtime()))

hunger = max(payload - stats.tasks_waiting, 0)

while hunger > 0:
t = time.time()
jobs = job_src.obtain(50, bijective=args.bijective)
with open(os.path.join(config["workdir"], 'debug_lobster_times'), 'a') as f:
delta = time.time() - t
if jobs == None:
size = 0
else:
size = len(jobs)
ratio = delta / float(size) if size != 0 else 0
f.write("CREA {0} {1} {2}\n".format(size, delta, ratio))

if jobs == None or len(jobs) == 0:
break

hunger -= len(jobs)

for id, cmd, inputs, outputs in jobs:
task = wq.Task(cmd)
task.specify_tag(id)
task.specify_cores(1)
# temporary work-around?
task.specify_memory(1100)
task.specify_disk(4000)

for (local, remote) in inputs:
if os.path.isfile(local):
task.specify_input_file(str(local), str(remote), wq.WORK_QUEUE_CACHE)
elif os.path.isdir(local):
task.specify_directory(local, remote, wq.WORK_QUEUE_INPUT,
wq.WORK_QUEUE_CACHE, recursive=True)
else:
raise NotImplementedError

for (local, remote) in outputs:
task.specify_output_file(str(local), str(remote))

queue.submit(task)

print "Waiting for jobs to return..."
task = queue.wait(3)
tasks = []
while task:
tasks.append(task)
if queue.stats.tasks_complete > 0:
task = queue.wait(1)
else:
task = None
print "Done waiting..."
if len(tasks) > 0:
t = time.time()
job_src.release(tasks)
with open(os.path.join(config["workdir"], 'debug_lobster_times'), 'a') as f:
delta = time.time() - t
size = len(tasks)
ratio = delta / float(size) if size != 0 else 0
f.write("RECV {0} {1} {2}\n".format(size, delta, ratio))
boil()
2 changes: 1 addition & 1 deletion lobster/cmssw/data/wrapper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ else
sed -i -e "s@//pscratch/osg/app/cmssoft/cms/@/cvmfs/cms.cern.ch/@" $sname
echo "$sconf$sname $sname" > mtab
echo ">>> starting parrot to access CMSSW..."
exec $PARROT_EXEC $PARROT_DEBUG_FLAGS -m mtab -t "$MYCACHE/ex_parrot_$(whoami)" $0 "$*"
exec $PARROT_EXEC $PARROT_DEBUG_FLAGS -m mtab -t "$MYCACHE/ex_parrot_$(whoami)" bash $0 "$*"
fi

source $VO_CMS_SW_DIR/cmsset_default.sh
Expand Down
2 changes: 1 addition & 1 deletion lobster/cmssw/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ def obtain(self, num=1, bijective=False):
outputs = [(os.path.join(sdir, f.replace('.root', '_%s.root' % id)), f) for f in self.__outputs[label]]
outputs.extend([(os.path.join(jdir, f), f) for f in ['report.xml.gz', 'cmssw.log.gz', 'report.pkl']])

cmd = './wrapper.sh python job.py {0} parameters.pkl'.format(config)
cmd = 'sh wrapper.sh python job.py {0} parameters.pkl'.format(config)

tasks.append((id, cmd, inputs, outputs))

Expand Down
26 changes: 11 additions & 15 deletions scripts/plot_stats.py → lobster/cmssw/plotting.py
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
#!/usr/bin/env python
# vim: fileencoding=utf-8

from argparse import ArgumentParser
from os.path import expanduser
from collections import defaultdict
from datetime import datetime
Expand All @@ -11,6 +9,7 @@
import pytz
import sqlite3
import gzip
import yaml

import matplotlib
import matplotlib.pyplot as plt
Expand Down Expand Up @@ -255,19 +254,16 @@ def save_and_close(dir, name):

return html_tag("img", src='{0}.png'.format(name))

if __name__ == '__main__':
parser = ArgumentParser(description='make histos')
parser.add_argument('directory', help="Specify input directory")
parser.add_argument('outdir', nargs='?', help="Specify output directory")
parser.add_argument("--xmin", type=int, help="Specify custom x-axis minimum", default=0, metavar="MIN")
parser.add_argument("--xmax", type=int, help="Specify custom x-axis maximum", default=None, metavar="MAX")
parser.add_argument('--samplelogs', action='store_true', help='Make a table with links to sample error logs', default=False)
args = parser.parse_args()
def plot(args):
with open(args.configfile) as configfile:
config = yaml.load(configfile)
workdir = os.path.expandvars(os.path.expanduser(config["workdir"]))

if args.outdir:
top_dir = args.outdir
else:
top_dir = os.path.join(os.environ['HOME'], 'www', os.path.basename(os.path.normpath(args.directory)))
top_dir = config.get("plotdir", config['id'])
top_dir = os.path.expandvars(os.path.expanduser(top_dir))

print 'Saving plots to: ' + top_dir
if not os.path.isdir(top_dir):
Expand All @@ -278,9 +274,9 @@ def save_and_close(dir, name):
wtags = SmartList()

print "Reading WQ log"
with open(os.path.join(args.directory, 'work_queue.log')) as f:
with open(os.path.join(workdir, 'work_queue.log')) as f:
headers = dict(map(lambda (a, b): (b, a), enumerate(f.readline()[1:].split())))
wq_stats_raw_all = np.loadtxt(os.path.join(args.directory, 'work_queue.log'))
wq_stats_raw_all = np.loadtxt(os.path.join(workdir, 'work_queue.log'))
start_time = wq_stats_raw_all[0,0] / 1e6
end_time = wq_stats_raw_all[-1,0]

Expand Down Expand Up @@ -343,7 +339,7 @@ def save_and_close(dir, name):
weights=[delta_sendtime / delta, delta_receivetime / delta],
label=['send time', 'receive time'])

db = sqlite3.connect(os.path.join(args.directory, 'lobster.db'))
db = sqlite3.connect(os.path.join(workdir, 'lobster.db'))
stats = {}

failed_jobs = np.array(db.execute("""select
Expand Down Expand Up @@ -619,7 +615,7 @@ def save_and_close(dir, name):
rows[row].append('')
else:
id, ds, e = j
from_path = os.path.join(args.directory, id2label[ds], 'failed', str(id))
from_path = os.path.join(workdir, id2label[ds], 'failed', str(id))
to_path = os.path.join(os.path.join(top_dir, 'errors'), str(id))
if os.path.exists(to_path):
shutil.rmtree(to_path)
Expand Down
17 changes: 17 additions & 0 deletions lobster/cmssw/publish.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,3 +338,20 @@ def __getitem__(self, item):

def __setitem__(self, key, value):
self.data[key] = value

def publish(args):
with open(args.configfile) as f:
config = yaml.load(f)

dir = config['workdir']

if len(args.labels) == 0:
args.labels = [task['label'] for task in config.get('tasks', [])]

for label in args.labels:
publisher = Publisher(config, dir, label)

if args.clean:
publisher.clean()
else:
publisher.publish(args.block_size)
Loading

0 comments on commit 831fb6c

Please sign in to comment.