Probabilistic HLA Typing

Paper: Prohlatype: A Probabilistic Framework for HLA Typing ¹

This project provides a set of tools to calculate the full posterior distribution of HLA types given read data.

Instead of:

	A1  	A2  	B1  	B2  	C1	    C2  	Reads	Objective
0	A*31:01	A*02:01	B*45:01	B*15:03	C*16:01	C*02:10	538.0	513.79

one can calculate:

Allele 1	Allele 2	Log P	P
A*02:05:01:01	A*30:114	-23046.81	0.5000
A*02:05:01:01	A*30:01:01	-23046.81	0.5000
A*02:05:01:01	A*30:106	-23103.15	0.0000
A*02:05:01:02	A*30:114	-23146.35	0.0000
...
B*07:36	B*57:03:01:02	-13717.33	0.5000
B*07:36	B*57:03:01:01	-13717.33	0.5000
B*07:36	B*57:03:03	-13804.74	0.0000
B*27:157	B*57:03:01:02	-13816.17	0.0000
...
C*06:103	C*18:10	-11936.35	0.3338
C*06:103	C*18:02	-11936.36	0.3331
C*06:103	C*18:01	-11936.36	0.3331
C*15:102	C*18:02	-11951.72	0.0000

How:

There are three options to obtain the software:

If you are running on Linux, standalone binaries are available with each release.
Use the linked Docker image.
Build the software from source:

a. Install opam.

b. Make sure that the opam packages are up to date:
```
 $ opam update
```
c. Make sure that you're on the relevant compiler:
```
 $ opam switch 4.06.0
 $ eval `opam config env`
```
d. Get source:
```
 $ git clone https://github.com/hammerlab/prohlatype.git prohlatype
 $ cd prohlatype
```
e. Install the dependent packages:
```
 $ make setup
```
f. Build the programs (afterwards they'll be in _build/default/src/apps):
```
 $ make
```

Make sure that you have IMGT/HLA available:

$ git clone https://github.com/ANHIG/IMGTHLA.git imgthla

"Prohla"-typing:

Create an imputed HLA reference sequence via align2fasta. This step makes sure that all alleles have sequence information that spans the entire locus. This way, reads that originate from a region for which we normally do not have sequence information will still align (in the next filtering step), albeit poorly:
```
 $ align2fasta path-to-imgthla/alignments -o imputed_hla_class_I
```
This step needs to be performed only once, per each IMGT version. Run $align2fasta --help for further information.
Filter your data against the reference, by first aligning. Ex:
```
 $ bwa mem imputed_hla_class_I.fasta ${SAMPLE}.fastq | \
     samtools view -F 4 -bT imputed_hla_class_I.fasta -o ${SAMPLE}.bam
```
While fundamentally, the algorithms here are alignment based. They're too slow to run for all sequences. Sequences that do not originate from the HLA-region would just act as background noice.

and then convert aligned reads back to FASTQ:

 $ samtools fastq ${SAMPLE}.bam > ${SAMPLE}_filtered.fastq

Infer types (see $ multi_par --help for further details):

 $ multi_par path-to-imgthla/aignments ${SAMPLE}_filtered.fastq -o ${SAMPLE}_output.tsv

Note: The script src/scripts/run-example-docker.sh provides an end-to-end example of the above. It depends only on docker, wget, and git; it fetches the data and runs everything in a docker container (see sh src/scripts/run-example-docker.sh help).

1: All versions of this software after 0.8.0 incorporate an important coverage likelihood that is not described in the previous paper. At the moment a short addendum describing the approach is in limbo, please contact me by email for a reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Probabilistic HLA Typing

How:

There are three options to obtain the software:

Make sure that you have IMGT/HLA available:

"Prohla"-typing:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Probabilistic HLA Typing

How:

There are three options to obtain the software:

Make sure that you have IMGT/HLA available:

"Prohla"-typing: