Skip to content

Commit

Permalink
Merge pull request #26 from cms-svj/Keane_temp
Browse files Browse the repository at this point in the history
Made code work for Run2V17
  • Loading branch information
Keane-Tan authored Feb 28, 2022
2 parents 6346752 + 5c7c8f8 commit aebd7eb
Show file tree
Hide file tree
Showing 44 changed files with 3,357 additions and 2,950 deletions.
29 changes: 26 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,27 @@ It's a good idea to get/renew a voms ticket if you're going to be working with X
voms-proxy-init -voms cms --valid 192:00
```
### Running the Analysis
Currently the default setup does not work for the neural network. So instead of doing `source init.sh` while you are in `t-channel_Analysis`,
you should follow the steps in https://github.com/cms-svj/t-channel_Analysis/issues/22 to set up the environment.
Also need to add `python -m pip install --ignore-installed magiconfig` to the set up steps above.
After that, you will also need to make soft links to the neural network file and the npz file that contains the normalization information.
For now you can do the following:
```
ln -s /uscms/home/keanet/nobackup/SVJ/Tagger/SVJTaggerNN/logs/test_tch_normMeanStd/net.pth net.pth
ln -s /uscms/home/keanet/nobackup/SVJ/Tagger/SVJTaggerNN/logs/test_tch_normMeanStd/normMeanStd.npz normMeanStd.npz
```
To make histograms locally using the signal and background ntuples, make sure you are in `t-channel_Analysis`
```bash
python analyze.py -d <sample label> -N <number of files>
```
This is usually done for debugging and testing purposes. The list of sample labels can be found in `input/sampleLabels.txt`.
To make neural network training files locally using the signal and background ntuples, make sure you are in `t-channel_Analysis`
```bash
python analyze_root_varModule.py -d <sample label> -N <number of files>
```
In both cases, the output files are called `test.root`.
<sample label> can be anything in the `input/sampleLabels.txt`, but be careful with the t-channel signals, because the JSON files contain both the pair production and full t-channel files. So for example, `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1` alone will probably make the code run over the pair production sample. We need to update `utils/samples.py` to make it smarter, but now just use <sample label> that are more specific. For example, `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1_13TeV-madgraphMLM` will grab the full t-channel samples, while `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1_13TeV-pythia8` will grab the pair production samples.

Running things locally is usually done for debugging and testing purposes.
* `-d`: sample labels for list of input files to run over, which can be found in `input/sampleLabels.txt`.
* `-N`: number of files from the sample to run over. Default is -1.
* `-M`: index of the first file to run over.
Expand All @@ -106,10 +122,17 @@ Dask can be used to run `analyze.py` in parallel, either locally or on Condor. T
To view the status dashboard, specify `--port 8NNN` (using the forwarded port from the earlier ssh command)
and navigate to `localhost:8NNN` in a web browser.

To make histograms on condor, cd into the `condor` directory and run
To make histograms on condor using the singularity container, cd into the `condor` directory (make sure you are in the default `coffeaenv` environment) and run
```bash
python singularitySubmit.py -d 2018_QCD,2018_TTJets,2018_WJets,2018_ZJets,2018_mMed -n 10 -w 1 --output [output directory for histogram files]
```
To make neural network training input files on condor, cd into the `condor` directory (make sure you are in the default `coffeaenv` environment) and run
```bash
python condorSubmit.py -d 2018_QCD,2018_TTJets,2018_WJets,2018_ZJets,2018_mMed -n 10 -w 1 --output testDir
python condorSubmit.py -d 2018_QCD,2018_mMed,2018_TTJets,2018_WJets,2018_ZJets -n 5 -w 1 --output [output directory] -p --pout [eos output directory for storing the training files]
```
* actually the argument after `--output` does not do anything in this case.
* also be careful when using `2018_mMed` after the `-d` flag, because the JSON files contain both the pair production and full t-channel signals, so those files have very similar names, and `utils/samples.py` may not grab the desired inputs. Still need to work on `samples.py` to make it smarter.

This will run over all the backgrounds (QCD, TTJets, WJets, ZJets) and the t-channel signals (the s-channel signals are labeled as 2016_mZprime,2017_mZprime). -n 10 means each job will use 10 root files as input, while -w 1 means we are using 1 CPU per job. An higher number of CPU used will use too much memory causing the job to be held, while a higher number of input files can make the job run longer and may also cause memory issue. After the jobs have finished running, the output histogram root files should be in `condor/testDir` (set by the --output flag).
* `-d`: sample labels for list of input files to run over. Can use the labels found in `input/sampleLabels.txt` or more general labels such as 2018_QCD.
* `-n`: number of files from the sample to run over. Default is -1.
Expand Down
21 changes: 20 additions & 1 deletion analyze.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
import time
from optparse import OptionParser
from glob import glob
import numpy as np
from magiconfig import MagiConfig
from utils.models import DNN, DNN_GRF
import torch

def use_dask(condor,njobs,port):
from dask.distributed import Client
Expand Down Expand Up @@ -93,11 +97,26 @@ def main():

sf = s.sfGetter(sample)
print("scaleFactor = {}".format(sf))
# open saved neural network
device = torch.device('cpu')
modelLocation = "."
varSet = ['njets', 'njetsAK8', 'nb', 'dPhij1rdPhij2AK8', 'dPhiMinjMETAK8', 'dEtaj12AK8', 'dRJ12AK8', 'jGirthAK8', 'jTau1AK8', 'jTau2AK8', 'jTau3AK8', 'jTau21AK8', 'jTau32AK8', 'jSoftDropMassAK8', 'jAxisminorAK8', 'jAxismajorAK8', 'jPtDAK8', 'jecfN2b1AK8', 'jecfN3b1AK8', 'jEleEFractAK8', 'jMuEFractAK8', 'jNeuHadEFractAK8', 'jPhoEFractAK8', 'jPhoMultAK8', 'jNeuMultAK8', 'jNeuHadMultAK8', 'jMuMultAK8', 'jEleMultAK8', 'jChHadMultAK8', 'jChMultAK8', 'jNeuEmEFractAK8', 'jHfHadEFractAK8', 'jHfEMEFractAK8', 'jChEMEFractAK8', 'jMultAK8', 'jecfN3b2AK8', 'jecfN2b2AK8', 'jPhiAK8', 'jEtaAK8']
hyper = MagiConfig(batchSize=2000, dropout=0.3, epochs=10, lambdaDC=0.0, lambdaGR=1.0, lambdaReg=0.0, lambdaTag=1.0, learning_rate=0.001, n_pTBins=35, num_of_layers_features=2, num_of_layers_pT=5, num_of_layers_tag=2, num_of_nodes=40, pTBins=[50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2500, 3000, 3500, 4000, 4500], rseed=30)
model = DNN_GRF(n_var=len(varSet), n_layers_features=hyper.num_of_layers_features, n_layers_tag=hyper.num_of_layers_tag, n_layers_pT=hyper.num_of_layers_pT, n_nodes=hyper.num_of_nodes, n_outputs=2, n_pTBins=hyper.n_pTBins, drop_out_p=hyper.dropout).to(device=device)
print("Loading model from file {}/net.pth".format(modelLocation))
model.load_state_dict(torch.load("{}/net.pth".format(modelLocation),map_location=device))
model.eval()
model.to('cpu')
normMeanStd = np.load("{}/normMeanStd.npz".format(modelLocation))
normMean = normMeanStd["normMean"]
normStd = normMeanStd["normStd"]

# run processor
output = processor.run_uproot_job(
fileset,
treename='TreeMaker2/PreSelection',
processor_instance=MainProcessor(sf),
# processor_instance=MainProcessor(sf,model,varSet,normMean,normStd),
processor_instance=MainProcessor(sample,sf,model,varSet,normMean,normStd),
executor=processor.dask_executor if options.dask else processor.futures_executor,
executor_args=exe_args,
chunksize=options.chunksize,
Expand Down
8 changes: 5 additions & 3 deletions analyze_root_varModule.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,12 @@ def main():

sf = s.sfGetter(sample)
print("scaleFactor = {}".format(sf))

# run processor
output = processor.run_uproot_job(
fileset,
treename='TreeMaker2/PreSelection',
processor_instance=MainProcessor(sf),
processor_instance=MainProcessor(sample,sf),
executor=processor.dask_executor if options.dask else processor.futures_executor,
executor_args=exe_args,
chunksize=options.chunksize,
Expand All @@ -114,9 +115,10 @@ def main():
values_dict[v] = output[v].value
tree = uproot.newtree(branchdict)
if values_dict != {}:
print("saving root files...")
with uproot.recreate("{}.root".format(outfile)) as f:
f["t"] = tree
f["t"].extend(values_dict)
f["tree"] = tree
f["tree"].extend(values_dict)
# print run time in seconds
dt = time.time() - tstart
print("run time: %.2f [sec]" % (dt))
Expand Down
2 changes: 1 addition & 1 deletion condor/run_Analyzer_condor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if [[ ${analyzeFile} == analyze.py ]]
then
mv MyAnalysis*.root ${base_dir}
else
xrdcp -f MyAnalysis*.root ${NNTrainingOut}.
xrdcp -f MyAnalysis*.root ${NNTrainingOut}/.
rm MyAnalysis*.root
fi

Expand Down
66 changes: 66 additions & 0 deletions condor/run_Singularity_condor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env bash

dataset_longname=$1
nfiles=$2
startfile=$3
workers=$4
chunksize=$5
analyzeFile=$6
NNTrainingOut=$7
base_dir=`pwd`

echo "ls output"
ls -l

echo "unpacking tar file"
mkdir tchannel
mv tchannel.tar.gz tchannel/.
cd tchannel
tar -xzf tchannel.tar.gz
ls -l

# Setup the activation script for the virtual environment
$ECHO "\nSetting up the activation script for the virtual environment ... "
sed -i '40s/.*/VIRTUAL_ENV="$(cd "$(dirname "$(dirname "${BASH_SOURCE[0]}" )")" \&\& pwd)"/' myenv/bin/activate
find myenv/bin/ -type f -print0 | xargs -0 -P 4 sed -i '1s/#!.*python$/#!\/usr\/bin\/env python/'
echo "Activating our virtual environment"
source myenv/bin/activate
storage_dir=$(readlink -f $PWD)
export TCHANNEL_BASE=${storage_dir}

echo "ls output"
ls -l

echo "output of uname -s : "
uname -s

echo "unpacking exestuff"
cp ${base_dir}/exestuff.tar.gz .
tar xzf exestuff.tar.gz
mv exestuff/* .
ls -l

echo "\n\n Attempting to run MyAnalysis executable.\n\n"
echo ${dataset_longname}
echo ${nfiles}
echo ${startfile}
echo ${workers}
# python analyze.py --condor -d ${dataset_longname} -N ${nfiles} -M ${startfile} -w ${workers} -s ${chunksize}
python ${analyzeFile} --condor -d ${dataset_longname} -N ${nfiles} -M ${startfile} -w ${workers} -s ${chunksize}

echo "\n\n ls output\n"
ls -l

if [[ ${analyzeFile} == analyze.py ]]
then
mv MyAnalysis*.root ${base_dir}
else
xrdcp -f MyAnalysis*.root ${NNTrainingOut}.
rm MyAnalysis*.root
fi

cd ${base_dir}
rm docker_stderror

echo "\n\n ls output\n"
ls -l
129 changes: 129 additions & 0 deletions condor/singularitySubmit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import sys, os
from os import system, environ
sys.path = [environ["TCHANNEL_BASE"],] + sys.path
from utils import samples as s
import optparse

def removeCopies(x):
return list(dict.fromkeys(x))

def makeExeAndFriendsTarball(filestoTransfer, fname, path):
system("mkdir -p %s" % fname)
for fn in removeCopies(filestoTransfer):
print(fn)
system("cd %s; ln -s %s" % (fname, fn))

tarallinputs = "tar czf %s/%s.tar.gz %s --dereference"% (path, fname, fname)
system(tarallinputs)
system("rm -r %s" % fname)

def getDatasets(datasets):
if datasets:
return datasets.split(',')
else:
print("No dataset specified")
exit(0)

def main():
# parse command line arguments
parser = optparse.OptionParser("usage: %prog [options]\n")
parser.add_option ('-n', dest='numfile', type='int', default = 10, help="number of files per job")
parser.add_option ('-d', dest='datasets', type='string', default = '', help="List of datasets, comma separated")
parser.add_option ('-c', dest='noSubmit', action='store_true', default = False, help="Do not submit jobs. Only create condor_submit.txt.")
parser.add_option ('-p', dest='makeROOT', action='store_true', default = False, help="Make root tree instead of histograms.")
parser.add_option ('--pout', dest='NNTrainOut', default = "", help="Directory to store the NN training root files.")
parser.add_option ('-w','--workers', dest='workers', type='int', default = 2, help='Number of workers to use for multi-worker executors (e.g. futures or condor)')
parser.add_option ('--output', dest='outPath', type='string', default = '.', help="Name of directory where output of each condor job goes")
parser.add_option('-s', '--chunksize',dest='chunksize',type='int', default=10000, help='Chunk size',)
options, args = parser.parse_args()

analyzeFile = "analyze.py"
if options.makeROOT:
if options.NNTrainOut == "":
raise Exception("Please specify the output directory for the NN training files using --pout.")
analyzeFile = "analyze_root_varModule.py"
# prepare the list of hardcoded files to transfer
filestoTransfer = [environ["TCHANNEL_BASE"] + "/net.pth",
environ["TCHANNEL_BASE"] + "/normMeanStd.npz"]

# add top of jdl file
fileParts = []
fileParts.append("Universe = vanilla\n")
fileParts.append("Executable = run_Singularity_condor.sh\n")
fileParts.append("+SingularityImage = \"/cvmfs/unpacked.cern.ch/registry.hub.docker.com/fnallpc/fnallpc-docker:pytorch-1.9.0-cuda11.1-cudnn8-runtime-singularity\"\n")
fileParts.append("Transfer_Input_Files = %s/%s.tar.gz, %s/exestuff.tar.gz\n" % (options.outPath,"tchannel",options.outPath))
fileParts.append("Should_Transfer_Files = YES\n")
fileParts.append("WhenToTransferOutput = ON_EXIT\n")
fileParts.append("request_disk = 1000000\n")
fileParts.append("request_memory = 2000\n")
fileParts.append("request_cpus = 4\n")
fileParts.append("x509userproxy = $ENV(X509_USER_PROXY)\n\n")

# loop over all sample collections in the dataset
datasets = getDatasets(options.datasets)
nFilesPerJob = options.numfile
numberOfJobs = 0
print("-"*50)
for sc in datasets:
print(sc)

# create the directory
if not os.path.isdir("%s/output-files/%s" % (options.outPath, sc)):
os.makedirs("%s/output-files/%s" % (options.outPath, sc))

# loop over all samples in the sample collection
samples = s.getFileset(sc, False)
for n, rFiles in samples.items():
count = len(rFiles)
print(" %-40s %d" % (n, count))

# loop over the root files that will be in each job
for startFileNum in range(0, count, nFilesPerJob):
numberOfJobs+=1
outputDir = "%s/output-files/%s" % (options.outPath, sc)

# list the output files that will be transfered to output directory
outfile = "MyAnalysis_%s_%s.root" % (n, startFileNum)
outputFiles = [
outfile,
]
transfer = "transfer_output_remaps = \""
for f_ in outputFiles:
transfer += "%s = %s/%s" % (f_, outputDir, f_)
if f_ != outputFiles[-1]:
transfer += "; "
transfer += "\"\n"

# add each job to the jdl file
fileParts.append(transfer)
fileParts.append("Arguments = %s %i %i %i %i %s %s\n"%(n, nFilesPerJob, startFileNum, options.workers, options.chunksize,analyzeFile,options.NNTrainOut))
fileParts.append("Output = %s/log-files/MyAnalysis_%s_%i.stdout\n"%(options.outPath, n, startFileNum))
fileParts.append("Error = %s/log-files/MyAnalysis_%s_%i.stderr\n"%(options.outPath, n, startFileNum))
fileParts.append("Log = %s/log-files/MyAnalysis_%s_%i.log\n"%(options.outPath, n, startFileNum))
fileParts.append("Queue\n\n")
print("-"*50)

# write out the jdl file
fout = open("condor_submit.jdl", "w")
fout.write(''.join(fileParts))
fout.close()

# print number jobs to run
print("Number of Jobs:", numberOfJobs)

# only runs when you submit
if not options.noSubmit:
# tar up working area to send with each job
print("-"*50)
print("Making the tar ball")
makeExeAndFriendsTarball(filestoTransfer, "exestuff", options.outPath)
#system("tar --exclude-caches-all --exclude-vcs -zcf %s/tchannel.tar.gz -C ${TCHANNEL_BASE}/.. tchannel --exclude=src --exclude=tmp" % options.outPath)
system("tar czf %s/tchannel.tar.gz -C ${TCHANNEL_BASE} . --exclude=coffeaenv --exclude=EventLoopFramework --exclude=test --exclude=output --exclude=condor --exclude=notebooks --exclude=root --exclude=.git --exclude=coffeaenv.tar.gz" % options.outPath)

# submit the jobs to condor
system('mkdir -p %s/log-files' % options.outPath)
system("echo 'condor_submit condor_submit.jdl'")
system('condor_submit condor_submit.jdl')

if __name__ == "__main__":
main()
Loading

0 comments on commit aebd7eb

Please sign in to comment.