Lea Frey, Philipp Baumann, Helge Aasen, Bruno Studer, Roland Kölliker
This is the code repository that produces the outputs of the manuscript with the above title.
The directory is self-contained and is designed to run reproducibly, either on your host operating system (local or remote) or in a Docker container (local or remote; relying on kernel of the host). The practical instructions to deploy this Docker image and run all analyses within this project can be found below. Attribution is given to Thomas Knecht aka Mr. Propper, who encouraged me do dockerize and proceed with the orchestration.
📔 Dockerfile
: Docker recipe that
will pull the operating system, pull system dependencies, install R
v3.6.0, and install all required R packages.
📔 R/
: Custom R functions required
for the analysis.
📔 _convert-images.R
: Produce .eps
outputs for the manuscript submission.
📔 _drake.R
: Load packages, load
functions, define the {drake} plan. The script runs {drake} make via
_make.R
.
📔 _make.R
: Invoke {drake} make via
callr for sanity using drake::r_make()
.
📔 code/
: R scripts for the analyis.
They will be run in sequential order. drake::code_to_plan()
in
_drake.R
will invoke them.
First, download this repository or clone it with git. (Git is a popular free and open source version control software. Simply download to feel it.)
git clone https://github.com/philipp-baumann/leaf-starch-spc
Windows users probably want to download the R 3.6.3 or older version of
rtools to
build packages from source. MacOS users will require
XCode for the compiler toolchain.
To restore all required packages at versions defined in the file
renv.lock
based on the renv R package, execute
the following in the project directory. You might first want to set up
the project directory in RStudio (see
here) unless you work
in a terminal.
install.packages("remotes")
remotes::install_github("rstudio/[email protected]")
# Automatically installs packages from CRAN and github
# as specified in `renv.lock`
renv::restore()
You can manually run the scripts in sequential order, but we recommend to deploy the entire workflow in automated manner using drake R package. This gives you tangible evidence of reproducibility.
# Make drake plan (targets and expressions in scripts: see ./code/:
# Starts a separate R process from R for safe interactivity
source("_make.R")
Docker provides an open-source solution to create an isolated software environment that captures the entire computational environment. This makes the data analysis scalable and reproducible, independent of the host operating system. To get started with Docker, there is an rOpenSci R Docker tutorial that explains the motivation and basics of using Docker for reproducible research. However, you can also just follow the steps outlined below. The only caveat at the moment is that the Docker image is a bit large (make sure you maybe have 150GB of disk space). I currently don’t know the reason for this.
A Dockerfile
is a text file that contains a recipe to build an image
with a layered approach. A docker container is a running instance of an
image. This
Dockerfile
is based on the
rocker/rstudio:3.6.0
image, which bases itself on
rocker/r-ver
with debian 9
(stretch) including version stable base R (v3.6.0) and the source build
tools. The RStudio image provides RStudio server within a Docker image,
that you can access via your browser. Basic instructions are given here,
but for getting started you can additionally consider this
resource.
The image is version-stable and uses the MRAN snapshot of the last day
that the R version 3.6.0 was the most recent release.
The workflow deployed here is fueled by the
{renv}
package,
which manages the installation of specific package versions and sources,
and the {drake}
package to keep
track of R code and data that produce the results.
The drake manual lists two
examples
in section 1.5 that combine {drake}
workflows with Docker. This can
give you some more detail of how everything works under the hood.
The following docker bash commands generates the computational environment, runs all computations, and let you grab the results of the entire analysis done in R.
- Build the docker image with instructions from the
Dockerfile
.
# Cache configuration: https://github.com/rstudio/renv/issues/362
# https://github.com/rstudio/renv/issues/400
docker build -t leaf-starch-spc .
- Check whether the image is built.
docker images
- Launch the container from the built image. Share two local paths as volumes (host) with the container. The analysis worflow orchestrated by {drake} will write output files (Figures) explained in the accompanying manuscript.
# https://www.rocker-project.org/use/managing_users/
# https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine
docker run --rm -d -p 8787:8787 \
-e PASSWORD=spcclover \
-v "$(pwd)/out:/home/rstudio/out" \
-v "$(pwd)/pub:/home/rstudio/pub" \
-e USERID=$UID -e GROUPID=$GID leaf-starch-spc
- Open RStudio server and kick-off the workflow. There are two suggestions deployment, one via docker running on your computer (4. i.), and the other via docker on a virtual machine tunnelled via ssh (4. ii.)
cat _make.R
## #!/usr/bin/env Rscript
##
## renv::restore()
##
## library("drake")
## r_make()
##
## cat("build_time_seconds ", round(sum(build_times()$elapsed), 2),
## file = here::here("_build-time.txt"))
drake::r_make()
invokes _drake.R
, calling drake::make()
in a
separate processs in the operating system to sanitize the make process.
# Run in the R console in RStudio Server
source("_make.R")
4.i. Local port-forwarding via ssh: The RStudio server service running within the docker image on the remote VM can be tunneled into your local browser session using ssh port forwarding. This is extremely convenient because you one can do interactive data analysis with “local feel”.
ssh -f -N -L 8787:localhost:8787 <your_user>@<host_ip_address>
Simply open RStudio Server in your browser on localhost:8787. Then,
login with user rstudio
and password spcclover
The files in this project are organized as follows (only 2 folder levels are shown):
## .
## ├── Dockerfile
## ├── Dockerfile_legacy
## ├── Makefile
## ├── R
## │ ├── helpers.R
## │ ├── modeling.R
## │ ├── select-spc-xvalues.R
## │ └── vip-wrappers.R
## ├── README.Rmd
## ├── README.md
## ├── _convert-images.R
## ├── _crop-images.R
## ├── _drake.R
## ├── _make.R
## ├── code
## │ ├── 10_read-clean-process-training.R
## │ ├── 20_build-spc-model-training.R
## │ ├── 21_interpret-training-vip.R
## │ ├── 22_remodel-vip-filtering.R
## │ ├── 23_remodel-cor-filtering.R
## │ ├── 24_remodel-starch-bands.R
## │ ├── 30_read-clean-process-test.R
## │ ├── 31_visualize-refdata.R
## │ ├── 40_predict-evaluate-train-test.R
## │ ├── 50_remodel-test.R
## │ ├── 51_interpret-test-vip.R
## │ ├── 52_remodel-test-vip-training.R
## │ └── 60_evaluate-test.R
## ├── code-legacy
## │ └── 25_remodel-mutual-information.R
## ├── cp-images.sh
## ├── cp-renv-lock.sh
## ├── data
## │ ├── test
## │ └── training
## ├── docker-base
## │ ├── Dockerfile_base
## │ ├── disable_auth_rserver.conf
## │ ├── pam-helper.sh
## │ └── userconf.sh
## ├── docker-compose.yml
## ├── leaf-starch-spc.Rproj
## ├── out
## │ ├── data
## │ └── figs
## ├── packages.R
## ├── pub
## │ ├── approval
## │ ├── figs
## │ ├── figs.zip
## │ ├── review-submission-2
## │ ├── submission-01
## │ ├── submission-02
## │ └── writing
## ├── renv
## │ ├── activate.R
## │ ├── library
## │ └── settings.dcf
## ├── renv.lock
## ├── ssd-to-vm.sh
## └── vm-to-ssd.sh