Skip to content

Commit

Permalink
Switch to using DT instead of rCharts and use rmarkdown 0.9.5 instead…
Browse files Browse the repository at this point in the history
… of knitrBootstrap
  • Loading branch information
lcolladotor committed Mar 21, 2016
1 parent 33c8638 commit 201b563
Show file tree
Hide file tree
Showing 3 changed files with 746 additions and 3,727 deletions.
59 changes: 25 additions & 34 deletions timing/timing.Rmd
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
---
title: "Timing information"
output:
knitrBootstrap::bootstrap_document:
theme.chooser: TRUE
highlight.chooser: TRUE
html_document:
toc: true
toc_float: true
code_folding: hide
---

Timing information
==================

```{r citationsSetup, echo=FALSE, message=FALSE, warning=FALSE, bootstrap.show.code=FALSE}
```{r citationsSetup, echo=FALSE, message=FALSE, warning=FALSE}
## Track time spent on making the report
startTime <- Sys.time()
Expand All @@ -25,9 +24,8 @@ bibs <- c("knitcitations" = citation("knitcitations"),
"derfinder" = citation("derfinder"),
"GenomicRanges" = citation("GenomicRanges"),
"DESeq" = citation("DESeq"),
"rCharts" = citation("rCharts"),
"DT" = citation("DT"),
"ggplot2" = citation("ggplot2"),
"knitrBootstrap" = citation("knitrBootstrap"),
'rmarkdown' = citation('rmarkdown'),
'knitr' = citation('knitr')[3],
'eff' = RefManageR::BibEntry('manual', key = 'eff', title = 'Efficiency analysis of Sun Grid Engine batch jobs', author = 'Alyssa Frazee', year = 2014, url = 'http://dx.doi.org/10.6084/m9.figshare.878000'),
Expand All @@ -51,14 +49,14 @@ system('cp ../../efficiency_analytics/client_secrets .')
system('python ../../efficiency_analytics/analyze_efficiency.py --email [email protected] --folder "Cluster/derSoftware" --outfile timing-derSoftware.txt')
```

```{r loadLibs, bootstrap.show.code=FALSE, warning = FALSE}
```{r loadLibs, warning = FALSE}
## Load libraries
library("ggplot2")
library("knitr")
```


```{r process, bootstrap.show.code=FALSE}
```{r process}
## Setup
## Define number of cores used
Expand Down Expand Up @@ -134,7 +132,7 @@ The following plots show the wall time and memory used by each job while taking

Points are colored by which analysis type they belong to. Note that the loading data step is required for the single-level and expressed-regions DER approaches as well as exon counting (with derfinder).

```{r edaAnalysis, fig.width=10, bootstrap.show.code=FALSE}
```{r edaAnalysis, fig.width=10, fig.height=7}
## Walltime and memory adjusted by number of cores (it's an approximation)
ggplot(all, aes(x=timeByCore, y=memByCore, colour=analysis, shape=software)) + geom_point(size = 3) + facet_grid(~ experiment) + xlab("Wall time (hrs) multiplied by the number of cores") + ylab("Memory (GB) divided by the number of cores") + scale_colour_brewer(palette="Dark2") + theme_bw(base_size = 18) + theme(legend.position=c(.5, .75), legend.box = 'horizontal')
ggplot(all, aes(x=log2(timeByCore), y=memByCore, colour=analysis, shape=software)) + geom_point(size = 3) + facet_grid(~ experiment) + xlab("Wall time (hrs) multiplied by the number of cores (log2)") + ylab("Memory (GB) divided by the number of cores") + scale_colour_brewer(palette="Dark2") + theme_bw(base_size = 18) + theme(legend.position=c(.5, .75), legend.box = 'horizontal')
Expand All @@ -150,7 +148,7 @@ dev.off()

## Resources by step for each analysis

```{r 'analysisSummary', bootstrap.show.code=FALSE}
```{r 'analysisSummary'}
getInfo <- function(df, sumTime = FALSE, peakCores = FALSE) {
memByCore <- max(df$memByCore)
walltime <- ifelse(sumTime, sum(df$walltime), max(df$walltime))
Expand Down Expand Up @@ -194,13 +192,13 @@ analysisSummary <- do.call(rbind, analysisSummary)

The table shown below shows per analysis the maximum memory used by a job and maximum wall time for that step. This is assuming that all jobs for a given step ran simultaneously. For example, that all jobs running `derfinder::analyzeChr()` were running at the same time. Note that for some analyses relied on the same steps, like loading the data (_fullCov_). This table can be useful to find the peak number of cores (the sum of cores for all jobs running simultaneously) for a given analysis step.

```{r 'analysisSumTab', results = 'asis', bootstrap.show.code=FALSE}
kable(analysisSummary, format = 'html', digits = c(2, 4, 2))
```{r 'analysisSumTab', results = 'asis'}
kable(analysisSummary, format = 'markdown', digits = c(2, 4, 2))
```

## Resources for each analysis

```{r 'peakSummary', bootstrap.show.code=FALSE}
```{r 'peakSummary'}
## Summary the information for each analysis
peaks <- lapply(names(analysisInfo), function(analysis) {
res_analysis <- lapply(exps, function(exp) {
Expand All @@ -223,8 +221,8 @@ We can further summarize the resources used by each analysis by identified the m

The table below shows the final summary. Note that in some analyses, the peak memory is from the _fullCov_ step. We did not focus on reducing the memory load of this step as we sacrificed memory for speed. We know that much lower memory limits can be achieved using 1 core instead of the 10 cores used.

```{r 'peakSumTab', bootstrap.show.code=FALSE, results = 'asis'}
kable(peaks, format = 'html', digits = c(2, 3, 2))
```{r 'peakSumTab', results = 'asis'}
kable(peaks, format = 'markdown', digits = c(2, 3, 2))
```

Regarding the high memory load for the HTML report, this could be significantly lowered by only loading the required coverage data used for the plots instead of the full output from the _fullCov_ step. That is, using the _which_ argument from `fullCoverage()` to create a much smaller _fullCov_ object, which would also reduce the memory used when plotting.
Expand Down Expand Up @@ -267,51 +265,44 @@ These are the following analysis steps:
1. __PNAS__ (Only for _Hippo_) Generate an HTML report comparing the derfinder results vs previously published results (PNAS paper).
1. __summInfo__ Summarize results to then use then in the derfinder software paper.

<link rel="stylesheet" href="http://ajax.aspnetcdn.com/ajax/jquery.dataTables/1.9.4/css/jquery.dataTables.css" />
<script src="http://ajax.aspnetcdn.com/ajax/jquery.dataTables/1.9.4/jquery.dataTables.min.js"></script>

```{r tables, results="asis", bootstrap.show.code=FALSE}
library("rCharts")
library("data.table")
```{r tables, results="asis"}
library("DT")
## Print whole table
d <- data.table(all[, c("experiment", "step", "walltime", "cores", "memG", "timeByCore", "memByCore", "software", "analysis", "jobid")])
t1 <- dTable(d, sPaginationType='full_numbers', iDisplayLength=50, sScrollX='100%')
t1$print("timing", cdn=TRUE)
d <- all[, c("experiment", "step", "walltime", "cores", "memG", "timeByCore", "memByCore", "software", "analysis", "jobid")]
datatable(d, options = list(pagingType='full_numbers', pageLength=50, scrollX='100%')) %>% formatRound(columns = c(3, 5:7), digits = 3)
```
<br/>

Table made using `rCharts` `r citep(bib[["rCharts"]])`.
Table made using `DT` `r citep(bib[["DT"]])`.

# Reproducibility

Date the report was generated.

```{r reproducibility1, echo=FALSE, bootstrap.show.code=FALSE}
```{r reproducibility1, echo=FALSE}
## Date the report was generated
Sys.time()
```

Wallclock time spent generating the report.

```{r "reproducibility2", echo=FALSE, bootstrap.show.code=FALSE}
```{r "reproducibility2", echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits=3)
```

`R` session information.

```{r "reproducibility3", echo=FALSE, bootstrap.show.code=FALSE, bootstrap.show.message=FALSE}
```{r "reproducibility3", echo=FALSE}
## Session info
options(width=120)
devtools::session_info()
```

# Bibliography

This report was generated using `knitrBootstrap` `r citep(bib[['knitrBootstrap']])`
with `knitr` `r citep(bib[['knitr']])` and `rmarkdown` `r citep(bib[['rmarkdown']])` running behind the scenes. Timing information extracted from the SGE reports using `efficiency analytics` `r citep(bib[["eff"]])`. Figures and citations were made using `ggplot2` `r citep(bib[["ggplot2"]])` and `knitcitations` `r citep(bib[['knitcitations']])` respectively.
This report was generated using `rmarkdown` `r citep(bib[['rmarkdown']])` with `knitr` `r citep(bib[['knitr']])` running behind the scenes. Timing information extracted from the SGE reports using `efficiency analytics` `r citep(bib[["eff"]])`. Figures and citations were made using `ggplot2` `r citep(bib[["ggplot2"]])` and `knitcitations` `r citep(bib[['knitcitations']])` respectively.

Citation file: [timing.bib](timing.bib)

Expand Down
21 changes: 7 additions & 14 deletions timing/timing.bib
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,12 @@ @Article{anders2010differential
url = {http://genomebiology.com/2010/11/10/R106/},
}

@Manual{vaidyanathan2013rcharts,
title = {rCharts: Interactive Charts using Javascript Visualization Libraries},
author = {Ramnath Vaidyanathan},
year = {2013},
note = {R package version 0.4.5},
@Manual{xie2015wrapper,
title = {DT: A Wrapper of the JavaScript Library 'DataTables'},
author = {Yihui Xie},
year = {2015},
note = {R package version 0.1},
url = {http://CRAN.R-project.org/package=DT},
}

@Book{wickham2009ggplot2,
Expand All @@ -64,19 +65,11 @@ @Book{wickham2009ggplot2
url = {http://had.co.nz/ggplot2/book},
}

@Manual{hester2014knitrbootstrap,
title = {knitrBootstrap: Knitr Bootstrap framework.},
author = {Jim Hester},
year = {2014},
note = {R package version 1.0.0},
url = {https://github.com/jimhester/},
}

@Manual{allaire2016rmarkdown,
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Aron Atkins and Rob Hyndman},
year = {2016},
note = {R package version 0.9.2},
note = {R package version 0.9.5},
url = {http://CRAN.R-project.org/package=rmarkdown},
}

Expand Down
Loading

0 comments on commit 201b563

Please sign in to comment.