-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
139 lines (102 loc) · 5.12 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ConNIS <img src="./man/figures/logo.svg" alt="ConNIS" align="right" width="120"/>
<!-- badges: start -->
<!-- badges: end -->
The `ConNIS` package provides the implementation of the *Consecutive Non-Insertion Sites* (ConNIS) method (Hanke et al., 2025) which determines putative essential genes based on the probabilities to observe sequences of non-insertions within genes by chance. The method allows to set a weight value $w$ to adjust for non-uniformly distributed gene-wise insertion sites across the genome. For a data driven selection of $w$ an labeling instability approach is provided.
In addition, the following methods are implemented in the package:
* the *Binomial* approach of the `TSAS 2.0` package (Burger et al., 2017) with an additional weigh parameter
* the *Tn5Gaps* method of the `TRANSIT` package (DeJesus et al, 2015) with an additional weigh parameter
* the determination of essential genes by applying the *Geometric* distribution with an additional weight parameter
Each of this method can be adjusted by $w$, too.
## Installation
You can install the development version of ConNIS from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("https://github.com/bips-hb/ConNIS")
```
## Example
We use a truncated real world dataset and a list of truncated insertion sites as an toy example for applying ConNIS.
```{r exampleRunConNIS, message=FALSE, warnings=FALSE}
library(ConNIS)
# Use the E. coli BW 25113 dataset but only the first 30 genes
truncated_ecoli <- ecoli_bw25113[1:30,]
# load the insertion sites by Goodall et al., 2018, but omit all insertion sites
# that do not lie within the genomic region of truncated_ecoli
truncated_is_pos <- sort(is_pos[is_pos <= max(truncated_ecoli$end) &
is_pos >= min(truncated_ecoli$start)])
# run ConNIS with weight = 1
results_ConNIS <-
ConNIS(ins.positions = truncated_is_pos,
gene.names = truncated_ecoli$gene,
gene.starts = truncated_ecoli$start,
gene.stops = truncated_ecoli$end,
genome.length = max(truncated_ecoli$end),
weight = 1)
results_ConNIS
```
Using the "Bonferroni-Holm" correction with $\alpha=0.05$ for multiple testing adjustment ConNIS declared the follwing 12 genes as essential:
```{r exampleGetSignificantGenes}
results_ConNIS %>% filter(p.adjust(p_value, "BH") <= 0.05)
```
Next, we re-run ConNIS with a smaller weight value and apply again a the "Bonferroni" method.
```{r rerunConNIS}
results_ConNIS <-
ConNIS(ins.positions = truncated_is_pos,
gene.names = truncated_ecoli$gene,
gene.starts = truncated_ecoli$start,
gene.stops = truncated_ecoli$end,
genome.length = max(truncated_ecoli$end),
weight = 0.2)
results_ConNIS %>% filter(p.adjust(p_value, "BH") <= 0.05)
```
6 genes are declared essential since smaller weights will make it harder to label a gene (by chance) as 'essential'.
Next, we give an example for selecting `weight` our of five different values by the instability approach. We will use the function's build-in parallelization option (`parallelization.type = "mclapply"`). NOTE: Only 10 subsamples are drawn for demonstration purpose. For real world applications we suggest $m \approx 500$.
```{r instability}
# set weight
weights <- seq(0.2, 1, 0.2)
instabilities_connis <-
instabilities(
method="ConNIS",
sig.level = 0.05,
p.adjust.mehtod = "BH",
ins.positions = truncated_is_pos,
gene.names = truncated_ecoli$gene,
gene.starts = truncated_ecoli$start,
gene.stops = truncated_ecoli$end,
genome.length = max(truncated_ecoli$end),
weights = weights,
m = 10,
d = 0.5,
use.parallelization = T,
parallelization.type = "mclapply",
numCores = detectCores()-1,
seed = 1,
set.rng = "L'Ecuyer-CMRG")
instabilities_connis
```
Applying ConNIS again with the weight value that had the minimal instability, i.e. `weight = 0.6`, 7 genes are declared essential:
```{r}
results_ConNIS <-
ConNIS(ins.positions = truncated_is_pos,
gene.names = truncated_ecoli$gene,
gene.starts = truncated_ecoli$start,
gene.stops = truncated_ecoli$end,
genome.length = max(truncated_ecoli$end),
weight = 0.6)
results_ConNIS %>% filter(p.adjust(p_value, "BH") <= 0.05)
```
## References
* `Burger, B. T., Imam, S., Scarborough, M. J., Noguera, D. R. & Donohue, T. J. Combining genome-scale experimental and computational methods to identify essential genes in rhodobacter sphaeroides. mSystems 2 (2017). URL http://dx. doi.org/10.1128/msystems.00015-17`
* `DeJesus, M. A., Ambadipudi, C., Baker, R., Sassetti, C. & Ioerger, T. R. Transit - a software tool for himar1 tnseq analysis. PLOS Computational Biology 11, e1004401 (2015). URL http://dx.doi.org/10.1371/journal.pcbi.1004401`
* `Goodall, E. C. A. et al. The essential genome of escherichia coli k-12. mBio 9 (2018). URL http://dx.doi.org/10.1128/mBio.02096-17.`