Skip to content

MaximilianStammnitz/Indelwald

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Scripting Copyright

Indelwald – from indel calls to indel spectra

Using the functions in Indelwald, we generate short insertion and deletion – also termed indels (ID) – spectra in line with the complex PCAWG classification scheme first specified by Alexandrov et al., 2020 (see COSMIC ID signature catalogues).

Note that, in order to produce your own indel spectra, you will need to specify:

  • the reference genome fasta file based on which your alignments' indel calls were generated
  • a VCF or dataframe object containing one indel per line, featuring the minimal set of columns "CHROM", "POS", "REF", "ALT"

My code then groups all of your short (< 80 bp) insertion and deletion variants based on the (current) 83 different indel types agreed upon by the PCAWG signature consortium – these reflect a consensus rule set regarding variant lengths, sequence type and immediate sequence context. Plotted spectra look like this example:

example

In the above case, we see enrichments of single-T deletions or extensions at poly-T homopolymers (lengths ≥5 bp). These spikes are indicative of DNA polymerase slippage, which is particularly prominent in tissues with DNA mismatch repair (MMR) deficiency – commonly classified as COSMIC signatures ID1 and ID2.

We have extensively benchmarked this script with indel calls from the Wellcome Sanger Institute's cross-species mutation rate project (Cagan, Baez-Ortega et al. 2022, Nature), featuring the following species: Human, Black-and-white colobus, Cat, Cow, Dog, Ferret, Giraffe, Harbour porpoise, Horse, Lion, Mouse, Naked mole-rat, Rabbit, Rat, Ringle-tailed lemur and Tiger. In theory, this code should run smoothly for ANY species with a reference genome.

Find it helpful or you simply enjoy making Indelwald spectra against your brand-new, awesome reference genome? Then why not surf the wave with your substitution calls? – have a look at its sister library SubstitutionSafari! If you do face a challenge in using the code or wish to provide general feedback, please get in touch directly via [email protected]


Citation

If you can make good use of these functions in your work, I would be grateful for your citation of our associated preprint: The evolution of two transmissible cancers in Tasmanian devils (Stammnitz et al. 2023, Science 380:6642)

About

R processing of indels.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages