-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathREADME.Rmd
118 lines (89 loc) · 3.42 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
[](https://travis-ci.org/dwinter/pafr)
[](https://codecov.io/github/dwinter/pafr?branch=master)
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# pafr
Read, manipulate and visualize 'Pairwise mApping Format' data in R
## Install
The package is not yet available on CRAN, but we will keep the master branch of
this repository stable. You can install using devtools
```r
#install.packages(devtools)
devtools::install_github("dwinter/pafr")
```
## Read in a .paf file and check it out
Having installed the package, making a whole-genome dotplot is as simple as
reading in an alignment and calling `dotplot`:
```{r, dotplot}
library(pafr, quietly=TRUE)
test_alignment <- system.file("extdata", "fungi.paf", package="pafr")
ali <- read_paf(test_alignment)
dotplot(ali)
```
## A paf file in R
`read_paf` takes alignments in a .paf file and represents them in table that
behaves very much like a standard R `data.frame`. The table
has columns for each of the 12 standard columns in the .paf format as well
columns for any any tags represented in the file. Printing the tables shows a
summary of the contents and lists the the available tags.
```{r, ali}
ali
```
Because the table behaves as a `data.frame`, it integrates with existing R functions.
For example, We can find the mean length of alignments in this file using the `alen`
column.
```{r, mean_alen}
mean(ali$alen)
```
Likewise, we can produce a ggplot histogram of the distribution of alignment-lengths
in the file.
```{r,len_distr}
ggplot(ali, aes(alen)) +
geom_histogram(colour="black", fill="steelblue", bins=20) +
theme_bw(base_size=16) +
ggtitle("Distribution of alignment lengths") +
scale_x_log10("Alignment-length")
```
If we decide we don't like those shorter alignments, we can remove them with
`subset` or `filter` from dplyr.
```{r, subset}
long_ali <- subset(ali, alen > 1e4)
long_ali
```
## Plots
In addition to the dotplot demonstrated above, the package implements two
other classes of genomic visualization
### Synteny plot
The synteny plot displays alignments between one query and one target sequence
in a given paf file. Using the alignment above, we first filter short alignments
then plot regions that align between query chromosome "Q_chr3" and target
"T_chr4":
```{r, synteny}
long_ali <- subset(ali, alen > 1e4)
plot_synteny(long_ali, q_chrom="Q_chr3", t_chrom="T_chr4", centre=TRUE)
```
### Coverage plot
The coverage plot displays all sequences in either the query or target genome,
shading those regions of each sequence that are covered by at least one
alignment. This can be a useful in identifying how alternative genome assemblies
differ from each other, or visualizing differences between related genomes.
In this example we visualize the sequences in the target genome, shading each
aligned-region according to query-sequence aligning to that region.
```{r, coverage}
plot_coverage(long_ali, fill='qname') +
scale_fill_brewer(palette="Set1")
```
## Bugs/Issues
Please use the issue tracker on this repo to let us know about any bugs or
issues that arise as you use this package.