-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
100 lines (61 loc) · 5.16 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# hlabud: HLA analysis in R <img width="25%" align="right" src="https://github.com/slowkow/hlabud/assets/209714/b39a3f04-c9a8-4867-a3e0-9434f0f9ef20"></img>
hlabud provides methods to retrieve sequence alignment data from [IMGTHLA] and convert the data into convenient R matrices ready for downstream analysis.
See the [usage examples](https://slowkow.github.io/hlabud/articles/examples.html) to learn how to use the data with logistic regression and dimensionality reduction.
[IMGTHLA]: https://github.com/ANHIG/IMGTHLA
For example, let’s consider a simple question about two HLA genotypes.
What amino acid positions are different between these two genotypes?
```{r example}
library(hlabud)
a <- hla_alignments("DRB1")
dosage(a$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
```
From this output, we can conclude that four positions (26, 28, 47, 86) distinguish these two HLA-DRB1 alleles.
We see that DRB1\*03:01:05 has a Y at position 26 and DRB1\*03:02:03 has a F.
Installation
============
The quickest way to get hlabud is to install from GitHub:
```{r install, eval=FALSE}
# install.packages("devtools")
devtools::install_github("slowkow/hlabud")
```
Examples
========
See the [usage examples](articles/examples.html) to get some ideas for how to use hlabud in your analyses.
- [Get a one-hot encoded matrix for all HLA-DRB1 alleles](articles/examples.html#get-a-one-hot-encoded-matrix-for-all-hla-drb1-alleles)
- [Convert genotypes to a dosage matrix](articles/examples.html#convert-genotypes-to-a-dosage-matrix)
- [Logistic regression association for amino acid positions](articles/examples.html#logistic-regression-association-for-amino-acid-positions)
- [UMAP embedding of 3,671 HLA-DRB1 alleles](articles/examples.html#umap-embedding-of-hla-drb1-alleles)
- [Get HLA allele frequencies from Allele Frequency Net Database (AFND)](articles/examples.html#get-hla-allele-frequencies-from-allele-frequency-net-database-afnd)
- [Compute HLA divergence with the Grantham distance matrix](articles/examples.html#compute-hla-divergence-with-the-grantham-distance-matrix)
- [Download and unpack all data from the latest IMGTHLA release](articles/examples.html#download-and-unpack-all-data-from-the-latest-imgthla-release)
- [Visualize the 3D molecular structure of HLA proteins and highlight specific amino acid residues](articles/visualize-hla-structure.html)
<a href="https://slowkow.github.io/hlabud/articles/examples.html#logistic-regression-association-for-amino-acid-positions">
<img width="49%" src="articles/examples_files/figure-html/glm-volcano-1.png">
</a>
<a href="https://slowkow.github.io/hlabud/articles/examples.html#umap-embedding-of-hla-drb1-alleles">
<img width="49%" src="articles/examples_files/figure-html/umap-2digit-1.png">
</a>
<a href="https://slowkow.github.io/hlabud/articles/examples.html#get-hla-allele-frequencies-from-allele-frequency-net-database-afnd">
<img width="49%" src="articles/examples_files/figure-html/afnd_dqb1_02_01-1.png">
</a>
<a href="https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html">
<img width="49%" src="https://github.com/slowkow/ggrepel/assets/209714/4843a850-a4fd-4832-9600-0b8e9c1bb904">
</a>
Citation
========
`hlabud` provides access to the data in IMGT/HLA database. Therefore, if you use `hlabud` then please cite the IMGT/HLA paper:
- Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. [IPD-IMGT/HLA Database.](https://www.ncbi.nlm.nih.gov/pubmed/31667505) Nucleic Acids Res. 2020;48: D948–D955. doi:10.1093/nar/gkz950
`hlabud` also provides access to the data in Allele Frequency Net Database (AFND). Therefore, if you use `hlabud::hla_frequencies()` then please cite the AFND paper:
- Gonzalez-Galarza FF, McCabe A, Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. [Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools.](https://pubmed.ncbi.nlm.nih.gov/31722398) Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029
Additionally, you can also cite the `hlabud` package like this:
- Slowikowski K. hlabud: methods for access and analysis of the human leukocyte antigen (HLA) gene sequence alignments from IMGT/HLA. R package version 1.0.0.
Related work
============
I recommend this article for anyone new to HLA, because the beautiful figures help to build intuition:
- La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. [Understanding the drivers of MHC restriction of T cell receptors.](https://www.ncbi.nlm.nih.gov/pubmed/29636542) Nat Rev Immunol. 2018;18: 467–478.
Learn about the conventions for HLA nomenclature:
- Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. [Nomenclature for factors of the HLA system, 2010.](https://www.ncbi.nlm.nih.gov/pubmed/20356336) Tissue Antigens. 2010;75: 291–455.
For case-control analysis of HLA genotype data, consider the
[BIGDAWG](https://CRAN.R-project.org/package=BIGDAWG) R package
available on CRAN. Here is the related article:
- Pappas DJ, Marin W, Hollenbach JA, Mack SJ. [Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.](https://pubmed.ncbi.nlm.nih.gov/26708359) Hum Immunol. 2016;77: 283–287.