Merge pull request #26 from cms-svj/Keane_temp

Made code work for Run2V17
cms-svj · Feb 28, 2022 · aebd7eb · aebd7eb
2 parents 6346752 + 5c7c8f8
commit aebd7eb
Show file tree

Hide file tree

Showing 44 changed files with 3,357 additions and 2,950 deletions.
diff --git a/README.md b/README.md
@@ -86,11 +86,27 @@ It's a good idea to get/renew a voms ticket if you're going to be working with X
 voms-proxy-init -voms cms --valid 192:00
 ```
 ### Running the Analysis
+Currently the default setup does not work for the neural network. So instead of doing `source init.sh` while you are in `t-channel_Analysis`,
+you should follow the steps in https://github.com/cms-svj/t-channel_Analysis/issues/22 to set up the environment.
+Also need to add `python -m pip install --ignore-installed magiconfig` to the set up steps above.
+After that, you will also need to make soft links to the neural network file and the npz file that contains the normalization information.
+For now you can do the following:
+```
+ln -s /uscms/home/keanet/nobackup/SVJ/Tagger/SVJTaggerNN/logs/test_tch_normMeanStd/net.pth net.pth
+ln -s /uscms/home/keanet/nobackup/SVJ/Tagger/SVJTaggerNN/logs/test_tch_normMeanStd/normMeanStd.npz normMeanStd.npz
+```
 To make histograms locally using the signal and background ntuples, make sure you are in `t-channel_Analysis`
 ```bash
 python analyze.py -d <sample label> -N <number of files>
 ```
-This is usually done for debugging and testing purposes. The list of sample labels can be found in `input/sampleLabels.txt`.
+To make neural network training files locally using the signal and background ntuples, make sure you are in `t-channel_Analysis`
+```bash
+python analyze_root_varModule.py -d <sample label> -N <number of files>
+```
+In both cases, the output files are called `test.root`.
+<sample label> can be anything in the `input/sampleLabels.txt`, but be careful with the t-channel signals, because the JSON files contain both the pair production and full t-channel files. So for example, `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1` alone will probably make the code run over the pair production sample. We need to update `utils/samples.py` to make it smarter, but now just use <sample label> that are more specific. For example, `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1_13TeV-madgraphMLM` will grab the full t-channel samples, while `2018_mMed-3000_mDark-20_rinv-0p3_alpha-peak_yukawa-1_13TeV-pythia8` will grab the pair production samples.
+
+Running things locally is usually done for debugging and testing purposes.
 * `-d`: sample labels for list of input files to run over, which can be found in `input/sampleLabels.txt`.
 * `-N`: number of files from the sample to run over. Default is -1.
 * `-M`: index of the first file to run over.
@@ -106,10 +122,17 @@ Dask can be used to run `analyze.py` in parallel, either locally or on Condor. T
 To view the status dashboard, specify `--port 8NNN` (using the forwarded port from the earlier ssh command)
 and navigate to `localhost:8NNN` in a web browser.
 
-To make histograms on condor, cd into the `condor` directory and run
+To make histograms on condor using the singularity container, cd into the `condor` directory (make sure you are in the default `coffeaenv` environment) and run
+```bash
+python singularitySubmit.py -d 2018_QCD,2018_TTJets,2018_WJets,2018_ZJets,2018_mMed -n 10 -w 1 --output [output directory for histogram files]
+```
+To make neural network training input files on condor, cd into the `condor` directory (make sure you are in the default `coffeaenv` environment) and run
 ```bash
-python condorSubmit.py -d 2018_QCD,2018_TTJets,2018_WJets,2018_ZJets,2018_mMed -n 10 -w 1 --output testDir
+python condorSubmit.py -d 2018_QCD,2018_mMed,2018_TTJets,2018_WJets,2018_ZJets -n 5 -w 1 --output [output directory] -p --pout [eos output directory for storing the training files]
 ```
+* actually the argument after `--output` does not do anything in this case.
+* also be careful when using `2018_mMed` after the `-d` flag, because the JSON files contain both the pair production and full t-channel signals, so those files have very similar names, and `utils/samples.py` may not grab the desired inputs. Still need to work on `samples.py` to make it smarter.
+
 This will run over all the backgrounds (QCD, TTJets, WJets, ZJets) and the t-channel signals (the s-channel signals are labeled as 2016_mZprime,2017_mZprime). -n 10 means each job will use 10 root files as input, while -w 1 means we are using 1 CPU per job. An higher number of CPU used will use too much memory causing the job to be held, while a higher number of input files can make the job run longer and may also cause memory issue. After the jobs have finished running, the output histogram root files should be in `condor/testDir` (set by the --output flag).
 * `-d`: sample labels for list of input files to run over. Can use the labels found in `input/sampleLabels.txt` or more general labels such as 2018_QCD.
 * `-n`: number of files from the sample to run over. Default is -1.

diff --git a/analyze.py b/analyze.py
@@ -8,6 +8,10 @@
 import time
 from optparse import OptionParser
 from glob import glob
+import numpy as np
+from magiconfig import MagiConfig
+from utils.models import DNN, DNN_GRF
+import torch
 
 def use_dask(condor,njobs,port):
     from dask.distributed import Client
@@ -93,11 +97,26 @@ def main():
 
     sf = s.sfGetter(sample)
     print("scaleFactor = {}".format(sf))
+    # open saved neural network
+    device = torch.device('cpu')
+    modelLocation = "."
+    varSet = ['njets', 'njetsAK8', 'nb', 'dPhij1rdPhij2AK8', 'dPhiMinjMETAK8', 'dEtaj12AK8', 'dRJ12AK8', 'jGirthAK8', 'jTau1AK8', 'jTau2AK8', 'jTau3AK8', 'jTau21AK8', 'jTau32AK8', 'jSoftDropMassAK8', 'jAxisminorAK8', 'jAxismajorAK8', 'jPtDAK8', 'jecfN2b1AK8', 'jecfN3b1AK8', 'jEleEFractAK8', 'jMuEFractAK8', 'jNeuHadEFractAK8', 'jPhoEFractAK8', 'jPhoMultAK8', 'jNeuMultAK8', 'jNeuHadMultAK8', 'jMuMultAK8', 'jEleMultAK8', 'jChHadMultAK8', 'jChMultAK8', 'jNeuEmEFractAK8', 'jHfHadEFractAK8', 'jHfEMEFractAK8', 'jChEMEFractAK8', 'jMultAK8', 'jecfN3b2AK8', 'jecfN2b2AK8', 'jPhiAK8', 'jEtaAK8']
+    hyper = MagiConfig(batchSize=2000, dropout=0.3, epochs=10, lambdaDC=0.0, lambdaGR=1.0, lambdaReg=0.0, lambdaTag=1.0, learning_rate=0.001, n_pTBins=35, num_of_layers_features=2, num_of_layers_pT=5, num_of_layers_tag=2, num_of_nodes=40, pTBins=[50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2500, 3000, 3500, 4000, 4500], rseed=30)
+    model = DNN_GRF(n_var=len(varSet),  n_layers_features=hyper.num_of_layers_features, n_layers_tag=hyper.num_of_layers_tag, n_layers_pT=hyper.num_of_layers_pT, n_nodes=hyper.num_of_nodes, n_outputs=2, n_pTBins=hyper.n_pTBins, drop_out_p=hyper.dropout).to(device=device)
+    print("Loading model from file {}/net.pth".format(modelLocation))
+    model.load_state_dict(torch.load("{}/net.pth".format(modelLocation),map_location=device))
+    model.eval()
+    model.to('cpu')
+    normMeanStd = np.load("{}/normMeanStd.npz".format(modelLocation))
+    normMean = normMeanStd["normMean"]
+    normStd = normMeanStd["normStd"]
+
     # run processor
     output = processor.run_uproot_job(
         fileset,
         treename='TreeMaker2/PreSelection',
-        processor_instance=MainProcessor(sf),
+        # processor_instance=MainProcessor(sf,model,varSet,normMean,normStd),
+        processor_instance=MainProcessor(sample,sf,model,varSet,normMean,normStd),
         executor=processor.dask_executor if options.dask else processor.futures_executor,
         executor_args=exe_args,
         chunksize=options.chunksize,

diff --git a/analyze_root_varModule.py b/analyze_root_varModule.py
@@ -93,11 +93,12 @@ def main():
 
     sf = s.sfGetter(sample)
     print("scaleFactor = {}".format(sf))
+
     # run processor
     output = processor.run_uproot_job(
         fileset,
         treename='TreeMaker2/PreSelection',
-        processor_instance=MainProcessor(sf),
+        processor_instance=MainProcessor(sample,sf),
         executor=processor.dask_executor if options.dask else processor.futures_executor,
         executor_args=exe_args,
         chunksize=options.chunksize,
@@ -114,9 +115,10 @@ def main():
             values_dict[v] = output[v].value
     tree = uproot.newtree(branchdict)
     if values_dict != {}:
+        print("saving root files...")
         with uproot.recreate("{}.root".format(outfile)) as f:
-            f["t"] = tree
-            f["t"].extend(values_dict)
+            f["tree"] = tree
+            f["tree"].extend(values_dict)
     # print run time in seconds
     dt = time.time() - tstart
     print("run time: %.2f [sec]" % (dt))

diff --git a/condor/run_Analyzer_condor.sh b/condor/run_Analyzer_condor.sh
@@ -44,7 +44,7 @@ if [[ ${analyzeFile} == analyze.py ]]
 then
   mv MyAnalysis*.root ${base_dir}
 else
-  xrdcp -f MyAnalysis*.root ${NNTrainingOut}.
+  xrdcp -f MyAnalysis*.root ${NNTrainingOut}/.
   rm MyAnalysis*.root
 fi
 

diff --git a/condor/run_Singularity_condor.sh b/condor/run_Singularity_condor.sh
@@ -0,0 +1,66 @@
+#!/usr/bin/env bash
+
+dataset_longname=$1
+nfiles=$2
+startfile=$3
+workers=$4
+chunksize=$5
+analyzeFile=$6
+NNTrainingOut=$7
+base_dir=`pwd`
+
+echo "ls output"
+ls -l
+
+echo "unpacking tar file"
+mkdir tchannel
+mv tchannel.tar.gz tchannel/.
+cd tchannel
+tar -xzf tchannel.tar.gz
+ls -l
+
+# Setup the activation script for the virtual environment
+$ECHO "\nSetting up the activation script for the virtual environment ... "
+sed -i '40s/.*/VIRTUAL_ENV="$(cd "$(dirname "$(dirname "${BASH_SOURCE[0]}" )")" \&\& pwd)"/' myenv/bin/activate
+find myenv/bin/ -type f -print0 | xargs -0 -P 4 sed -i '1s/#!.*python$/#!\/usr\/bin\/env python/'
+echo "Activating our virtual environment"
+source myenv/bin/activate
+storage_dir=$(readlink -f $PWD)
+export TCHANNEL_BASE=${storage_dir}
+
+echo "ls output"
+ls -l
+
+echo "output of uname -s : "
+uname -s
+
+echo "unpacking exestuff"
+cp ${base_dir}/exestuff.tar.gz .
+tar xzf exestuff.tar.gz
+mv exestuff/* .
+ls -l
+
+echo "\n\n Attempting to run MyAnalysis executable.\n\n"
+echo ${dataset_longname}
+echo ${nfiles}
+echo ${startfile}
+echo ${workers}
+# python analyze.py --condor -d ${dataset_longname} -N ${nfiles} -M ${startfile} -w ${workers} -s ${chunksize}
+python ${analyzeFile} --condor -d ${dataset_longname} -N ${nfiles} -M ${startfile} -w ${workers} -s ${chunksize}
+
+echo "\n\n ls output\n"
+ls -l
+
+if [[ ${analyzeFile} == analyze.py ]]
+then
+  mv MyAnalysis*.root ${base_dir}
+else
+  xrdcp -f MyAnalysis*.root ${NNTrainingOut}.
+  rm MyAnalysis*.root
+fi
+
+cd ${base_dir}
+rm docker_stderror
+
+echo "\n\n ls output\n"
+ls -l
diff --git a/condor/singularitySubmit.py b/condor/singularitySubmit.py
@@ -0,0 +1,129 @@
+import sys, os
+from os import system, environ
+sys.path = [environ["TCHANNEL_BASE"],] + sys.path
+from utils import samples as s
+import optparse
+
+def removeCopies(x):
+  return list(dict.fromkeys(x))
+
+def makeExeAndFriendsTarball(filestoTransfer, fname, path):
+    system("mkdir -p %s" % fname)
+    for fn in removeCopies(filestoTransfer):
+        print(fn)
+        system("cd %s; ln -s %s" % (fname, fn))
+
+    tarallinputs = "tar czf %s/%s.tar.gz %s --dereference"% (path, fname, fname)
+    system(tarallinputs)
+    system("rm -r %s" % fname)
+
+def getDatasets(datasets):
+    if datasets:
+        return datasets.split(',')
+    else:
+        print("No dataset specified")
+        exit(0)
+
+def main():
+    # parse command line arguments
+    parser = optparse.OptionParser("usage: %prog [options]\n")
+    parser.add_option ('-n',              dest='numfile',  type='int',                         default = 10,            help="number of files per job")
+    parser.add_option ('-d',              dest='datasets', type='string',                      default = '',            help="List of datasets, comma separated")
+    parser.add_option ('-c',              dest='noSubmit',                action='store_true', default = False,         help="Do not submit jobs.  Only create condor_submit.txt.")
+    parser.add_option ('-p',              dest='makeROOT',                action='store_true', default = False,         help="Make root tree instead of histograms.")
+    parser.add_option ('--pout',          dest='NNTrainOut',                                   default = "",            help="Directory to store the NN training root files.")
+    parser.add_option ('-w','--workers',  dest='workers',  type='int',                         default = 2,             help='Number of workers to use for multi-worker executors (e.g. futures or condor)')
+    parser.add_option ('--output',        dest='outPath',  type='string',                      default = '.',           help="Name of directory where output of each condor job goes")
+    parser.add_option('-s', '--chunksize',dest='chunksize',type='int',                           default=10000,           help='Chunk size',)
+    options, args = parser.parse_args()
+
+    analyzeFile = "analyze.py"
+    if options.makeROOT:
+        if options.NNTrainOut == "":
+            raise Exception("Please specify the output directory for the NN training files using --pout.")
+        analyzeFile = "analyze_root_varModule.py"
+    # prepare the list of hardcoded files to transfer
+    filestoTransfer = [environ["TCHANNEL_BASE"] + "/net.pth",
+                       environ["TCHANNEL_BASE"] + "/normMeanStd.npz"]
+
+    # add top of jdl file
+    fileParts = []
+    fileParts.append("Universe   = vanilla\n")
+    fileParts.append("Executable = run_Singularity_condor.sh\n")
+    fileParts.append("+SingularityImage = \"/cvmfs/unpacked.cern.ch/registry.hub.docker.com/fnallpc/fnallpc-docker:pytorch-1.9.0-cuda11.1-cudnn8-runtime-singularity\"\n")
+    fileParts.append("Transfer_Input_Files = %s/%s.tar.gz, %s/exestuff.tar.gz\n" % (options.outPath,"tchannel",options.outPath))
+    fileParts.append("Should_Transfer_Files = YES\n")
+    fileParts.append("WhenToTransferOutput = ON_EXIT\n")
+    fileParts.append("request_disk = 1000000\n")
+    fileParts.append("request_memory = 2000\n")
+    fileParts.append("request_cpus = 4\n")
+    fileParts.append("x509userproxy = $ENV(X509_USER_PROXY)\n\n")
+
+    # loop over all sample collections in the dataset
+    datasets = getDatasets(options.datasets)
+    nFilesPerJob = options.numfile
+    numberOfJobs = 0
+    print("-"*50)
+    for sc in datasets:
+        print(sc)
+
+        # create the directory
+        if not os.path.isdir("%s/output-files/%s" % (options.outPath, sc)):
+            os.makedirs("%s/output-files/%s" % (options.outPath, sc))
+
+        # loop over all samples in the sample collection
+        samples = s.getFileset(sc, False)
+        for n, rFiles in samples.items():
+            count = len(rFiles)
+            print("    %-40s %d" % (n, count))
+
+            # loop over the root files that will be in each job
+            for startFileNum in range(0, count, nFilesPerJob):
+                numberOfJobs+=1
+                outputDir = "%s/output-files/%s" % (options.outPath, sc)
+
+                # list the output files that will be transfered to output directory
+                outfile = "MyAnalysis_%s_%s.root" % (n, startFileNum)
+                outputFiles = [
+                   outfile,
+                ]
+                transfer = "transfer_output_remaps = \""
+                for f_ in outputFiles:
+                    transfer += "%s = %s/%s" % (f_, outputDir, f_)
+                    if f_ != outputFiles[-1]:
+                        transfer += "; "
+                transfer += "\"\n"
+
+                # add each job to the jdl file
+                fileParts.append(transfer)
+                fileParts.append("Arguments = %s %i %i %i %i %s %s\n"%(n, nFilesPerJob, startFileNum, options.workers, options.chunksize,analyzeFile,options.NNTrainOut))
+                fileParts.append("Output = %s/log-files/MyAnalysis_%s_%i.stdout\n"%(options.outPath, n, startFileNum))
+                fileParts.append("Error = %s/log-files/MyAnalysis_%s_%i.stderr\n"%(options.outPath, n, startFileNum))
+                fileParts.append("Log = %s/log-files/MyAnalysis_%s_%i.log\n"%(options.outPath, n, startFileNum))
+                fileParts.append("Queue\n\n")
+        print("-"*50)
+
+    # write out the jdl file
+    fout = open("condor_submit.jdl", "w")
+    fout.write(''.join(fileParts))
+    fout.close()
+
+    # print number jobs to run
+    print("Number of Jobs:", numberOfJobs)
+
+    # only runs when you submit
+    if not options.noSubmit:
+        # tar up working area to send with each job
+        print("-"*50)
+        print("Making the tar ball")
+        makeExeAndFriendsTarball(filestoTransfer, "exestuff", options.outPath)
+        #system("tar --exclude-caches-all --exclude-vcs -zcf %s/tchannel.tar.gz -C ${TCHANNEL_BASE}/.. tchannel --exclude=src --exclude=tmp" % options.outPath)
+        system("tar czf %s/tchannel.tar.gz -C ${TCHANNEL_BASE} . --exclude=coffeaenv --exclude=EventLoopFramework --exclude=test --exclude=output --exclude=condor --exclude=notebooks --exclude=root --exclude=.git --exclude=coffeaenv.tar.gz" % options.outPath)
+
+        # submit the jobs to condor
+        system('mkdir -p %s/log-files' % options.outPath)
+        system("echo 'condor_submit condor_submit.jdl'")
+        system('condor_submit condor_submit.jdl')
+
+if __name__ == "__main__":
+    main()