Epitopedia

Getting started

The quickest way to start using Epitopedia is by downloading the docker container which contains all the dependencies preinstalled:

git clone https://github.com/cbalbin-bio/Epitopedia.git

docker pull cbalbin/epitopedia

Epitopedia requires the PDB in mmCIF format, EpitopediaDB and EPI-SEQ DB. EpitopediaDB and EPI-SEQ DB can be downloaded here.

To download the entirety of PDB in mmCIF format:

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./mmCIF

OR

To download the only the PDB files present in EpitopediaDB (EPI-PDB) you can supply the pdb_id_list.txt to rsync:

rsync -rlpt -v -z --delete --port=33444 --include-from=/path/to/pdb_id_list.txt \
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./mmCIF

To run Epitopedia provide the paths to the various directories discussed below.

The data directory should contain Epitopedia DB (epitopedia.sqlite3) and EPI-SEQ (EPI-SEQ.fasta*) which can be downloaded here.

The mmcif directory should point to the sharded PDB directory in mmCIF format as downloaded above.

NOTE: you may need to unzip the mmCIF directory:

gunzip -r mmCIF

The output directory is where the output files will be written.

Replace the the paths on the left side of the colon with the actual absolute path on your local system. The paths on the right side of the colon are internal and should not be altered.

python3 Epitopedia/docker/run_epitopedia.py \
/Path/to/Output/Dir/ \
/Path/to/PDB/Dir/ \
/Path/to/Data/Dir/ \
--afdb-dir /Path/to/AlphaFold/Dir/ \
--taxid-filter 11118 --PDB-IDS 6VXX_A

NOTE: on some systems you may need to run docker with sudo.

It is recommended to use the flag taxid_filter to prevent the input protein from finding itself or other versions of itself. For example, if we wnted to find mimics of the SARS-CoV-2 spike protien (6VXX) is a SARS-CoV-2 protein we could use a taxid_filter of 11118 to prevent finding mimics in other Coronaviridae. The NCBI Taxonomy Browser will be helpful in determining what taxid to specify.

Epitopedia can run on multiple input structures to represent a conformational ensemble. To do so, simply provide a list of structures in the format PDBID_CHAINID as shown below.

run_epitopedia.py --PDB-IDS 6VXX_A 6VXX_B 6XR8_A 6XR8_B

Epitopedia defaults to a span length of 5, surface accesbility cutoff of 20% surface accesbility span legnth of 3, and no taxa filter, but these parameters can be set using the follow flags:

Flag	Description
--span	Minimum span length for a hit to progress
--rasa	Cutoff for relative accessible surface area
--rasa_span	Minimum consecutive accesssible residues to consider a hit a SeqBMM
--taxid_filter	taxa filter; example to filter out all Coronaviridae --taxid_filter 11118
--rmsd	Max RMSD to still be considered a structural mimic
--view	View results from a previous run
--port	Port to be used by webserver
--use-afdb	Include AFDB in search
--pplddt	Minimum protein pLDDT score a structure predicted by alphafold must have to be considered
--mplddt	Minimum average local pLDDT score a region predicted by alphafold must have to be considered

Output

Example output files 6VXX_A with a taxid_filt of 11118 as an input can be found here.
Definitions for the output file headers can be found here.

Intermediate Output

Epitopedia will output the following files at various stages of execution:

File Name	Description
EPI_SEQ_hits_{pdb_id(s)}.tsv	Contains the raw results from the BLAST search of the input structure against EPI-SEQ
EPI_SEQ_span_filt_hits_{pdb_id(s)}.tsv	Contains hits with consecutive spans that meet the set minimum span length
EPI_SEQ_span_filt_acc_hits_{pdb_id(s)}.tsv	Contains the above spans that contain the minimum span of accessible residues
EPI_PDB_hits_{pdb_id(s)}.tsv"	Contains epitope source sequences against EPI_PDB hits
EPI_PDB_fragment_pairs_{pdb_id(s)}.tsv	Contains structurally aligned fragment pairs consisting of spans of the input structure aligned against the structural representatives
EPI_PDB_fragment_pairs_{pdb_id(s)}_ranked.tsv	Contains the above but ranked from best to worst RMSD

Final Output

Epitopedia will show the best hit per epitope motif if there are redundant source sequences at the final stage of the execution. There results can be viewed in a tsv file (Example) or a more legible HTML file (Example).

Epitopedia database generation

Epitopedia uses IEDB and PDB to generate EpitopediaDB, which is used in the molecular mimicry search.

Generation of the database takes some time (~12 hours). Thus, the EpitopediaDB is provided above.

To create the EpitopediaDB, download IEDB and a mmCIF version of PDB.

Point the container to the approriate paths for the IEDB, PDB (mmCIF format) and a data directory where the databases will be written.

docker run --rm -it \
-v /Path/To/iedb_public.sql:/app/iedb \
-v /Path/to/mmCIF/Dir/:/app/mmcif \
-v /Path/to/Data/Dir/:/app/data \
cbalbin/epitopedia generate_database.py

License

This software is released under the MIT License.

Software and databases used in Epitopedia may be released under various licenses:

Software:

Databases:

IEDB
PDB

Reference

If you use Epitopedia in your work, please cite:

Epitopedia: identifying molecular mimicry of known immune epitopes
Christian Andrew Balbin, Janelle Nunez-Castilla, Jessica Siltberg-Liberles
bioRxiv 2021.08.26.457577; doi: https://doi.org/10.1101/2021.08.26.457577

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.devcontainer		.devcontainer
docker		docker
epitopedia		epitopedia
example_output		example_output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
debug_entrypoint.py		debug_entrypoint.py
headers.md		headers.md
pdb_id_list.txt		pdb_id_list.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Epitopedia

Getting started

Output

Intermediate Output

Final Output

Epitopedia database generation

License

Reference

About

Languages

License

cbalbin-bio/Epitopedia

Folders and files

Latest commit

History

Repository files navigation

Epitopedia

Getting started

Output

Intermediate Output

Final Output

Epitopedia database generation

License

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages