Skip to content

Md claussnitzerlab method #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions claussnitzerlab/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
root = true

[*]
insert_final_newline = true

[*.java]
indent_style = space
indent_size = 4
trim_trailing_whitespace = true

[*.{scala,sbt}]
indent_style = space
indent_size = 2
trim_trailing_whitespace = true
4 changes: 4 additions & 0 deletions claussnitzerlab/.scalafmt.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
version = "2.4.2"
align=more
docstrings=ScalaDoc
maxColumn=120
29 changes: 29 additions & 0 deletions claussnitzerlab/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Copyright 2020 <COPYRIGHT HOLDER>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
14 changes: 14 additions & 0 deletions claussnitzerlab/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# claussnitzerlab

This is the documentation about the method.

Please put some details here about the method, what its inputs are, what its
outputs are, where it reads from, and where it writes to.

## Stages

These are the stages of claussnitzerlab.

### ClaussnitzerlabStage

A description of what this stage does.
70 changes: 70 additions & 0 deletions claussnitzerlab/built.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
val Versions = new {
val Aggregator = "0.3.0-SNAPSHOT"
val Scala = "2.13.2"
}

// set the version of scala to compile with
scalaVersion := Versions.Scala

// add scala compile flags
scalacOptions ++= Seq(
"-feature",
"-deprecation",
"-unchecked",
"-Ywarn-value-discard"
)

// add required libraries
libraryDependencies ++= Seq(
"org.broadinstitute.dig" %% "dig-aggregator-core" % Versions.Aggregator
)

// set the oranization this method belongs to
organization := "org.broadinstitute.dig"

// entry point when running this method
mainClass := Some("org.broadinstitute.dig.aggregator.methods.claussnitzerlab.Claussnitzerlab")

// enables buildInfo, which bakes git version info into the jar
enablePlugins(GitVersioning)

// get the buildInfo task
val buildInfoTask = taskKey[Seq[File]]("buildInfo")

// define execution code for task
buildInfoTask := {
val file = (resourceManaged in Compile).value / "version.properties"

// log where the properties will be written to
streams.value.log.info(s"Writing version info to $file...")

// collect git versioning information
val branch = git.gitCurrentBranch.value
val lastCommit = git.gitHeadCommit.value
val describedVersion = git.gitDescribedVersion.value
val anyUncommittedChanges = git.gitUncommittedChanges.value
val remoteUrl = (scmInfo in ThisBuild).value.map(_.browseUrl.toString)
val buildDate = java.time.Instant.now

// map properties
val properties = Map[String, String](
"branch" -> branch,
"lastCommit" -> lastCommit.getOrElse(""),
"remoteUrl" -> remoteUrl.getOrElse(""),
"uncommittedChanges" -> anyUncommittedChanges.toString,
"buildDate" -> buildDate.toString
)

// build properties content
val contents = properties.toList.map {
case (key, value) if value.length > 0 => s"$key=$value"
case _ => ""
}

// output the version information from git to versionInfo.properties
IO.write(file, contents.mkString("\n"))
Seq(file)
}

// add the build info task output to resources
(resourceGenerators in Compile) += buildInfoTask.taskValue
1 change: 1 addition & 0 deletions claussnitzerlab/project/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version=1.3.10
1 change: 1 addition & 0 deletions claussnitzerlab/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
addSbtPlugin("com.typesafe.sbt" % "sbt-git" % "1.0.0")
41 changes: 41 additions & 0 deletions claussnitzerlab/src/main/resources/claussnitzerlabBootstrap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash -xe

# Bootstrap scripts can either be run as a...
#
# bootstrapScript
# bootstrapStep
#
# A bootstrap script is run while the machine is being provisioned by
# AWS, are run as a different user, and must complete within 60 minutes
# or the provisioning fails. This can be a good thing, as it prevents
# accidentally creating scripts that never terminate (e.g. waiting for
# user input).
#
# A bootstrap step is a "step" like any other job step. It can take as
# long as needed. It is run as the hadoop user and is run in the step's
# directory (e.g. /mnt/var/lib/hadoop/steps/s-123456789).
#
# Most of the time, it's best to user a bootstrap script and not step.

#sudo yum groups mark convert
#
## check if GCC, make, etc. are installed already
#DEVTOOLS=$(sudo yum grouplist | grep 'Development Tools')
#
#if [ -z "$DEVTOOLS" ]; then
# sudo yum groupinstall -y 'Development Tools'
#fi

WORK_DIR="/mnt/var/claussnitzerlab"
mkdir -p "${WORK_DIR}"

# install the python libraries
sudo pip3 install torch==1.5.1
sudo pip3 install twobitreader
sudo pip3 install numpy
sudo pip3 install sklearn

# copy the basset python files and the model weights file
aws s3 cp s3://dig-analysis-data/bin/regionpytorch/nasa_labels.txt "${WORK_DIR}"
aws s3 cp s3://dig-analysis-data/bin/regionpytorch/hg19.2bit "${WORK_DIR}"
aws s3 cp s3://dig-analysis-data/bin/regionpytorch/nasa_ampt2d_cnn_900_best_p041.pth "${WORK_DIR}"
48 changes: 48 additions & 0 deletions claussnitzerlab/src/main/resources/claussnitzerlabScript.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/bin/bash -xe

echo "JOB_BUCKET = ${JOB_BUCKET}"
echo "JOB_METHOD = ${JOB_METHOD}"
echo "JOB_STAGE = ${JOB_STAGE}"
echo "JOB_PREFIX = ${JOB_PREFIX}"
echo "JOB_DRYRUN = ${JOB_DRYRUN}"
#
# You can also pass command line arguments to the script from your stage.
#

echo "Argument passed: $*"

# set where the source and destination is in S3 and where VEP is
S3DIR="s3://dig-analysis-data/out"

# get the name of the part file from the command line; set the output filename
PART=$(basename -- "$1")
OUTFILE="${PART%.*}claussnitzerlab.json"
WARNINGS="${OUTFILE}_warnings.txt"
WORK_DIR="/mnt/var/claussnitzerlab"

# copy the part file from S3 to local
aws s3 cp "$S3DIR/varianteffect/common/$PART" "${WORK_DIR}"

# copy the basset python files and the model weights file
aws s3 cp s3://dig-analysis-data/bin/regionpytorch/fullNasaScript.py "${WORK_DIR}"
aws s3 cp s3://dig-analysis-data/bin/regionpytorch/dcc_basset_lib.py "${WORK_DIR}"

# cd to work directory
cd "${WORK_DIR}"

# run pytorch script
python3 fullNasaScript.py -i "$PART" -b 100 -o "$OUTFILE"

# copy the output of VEP back to S3
aws s3 cp "$OUTFILE" "$S3DIR/regionpytorch/claussnitzerlab/$OUTFILE"

# delete the input and output files; keep the cluster clean
rm "$PART"
rm "$OUTFILE"

# check for a warnings file, upload that, too and then delete it
if [ -e "$WARNINGS" ]; then
aws s3 cp "$WARNINGS" "$S3DIR/regionpytorch/claussnitzerlab/warnings/$WARNINGS"
rm "$WARNINGS"
fi

Loading