Pannagram

Pannagram is a package for constructing pan-genome alignments, analyzing structural variants, and translating annotations between genomes. Additionally, Pannagram contains useful functions for visualization. The manual is available at the pannagram-page.

Setting Up the Working Environment

Follow these instructions to set up your Pannagram environment.

Prerequisites

Make sure you have one of the following package managers installed:

Use your selected package manager by replacing <manager> with conda, mamba, or micromamba.

Linux and macOS (Intel)

<manager> env create -f pannagram.yml
<manager> activate pannagram

macOS (M-series chips)

<manager> env create --platform osx-64 -f pannagram_m4.yml
<manager> activate pannagram

Alternative: Setting Up the Environment Without Explicit Versions

Use this option if you prefer an environment where package versions are not explicitly specified, and packages are installed with the latest compatible versions available:

Linux and macOS (Intel)

<manager> env create -f pannagram_min.yml
<manager> activate pannagram

macOS (M-series chips)

<manager> env create --platform osx-64 -f pannagram_min.yml
<manager> activate pannagram

Running RStudio with the Environment

Make sure that RStudio-Desktop is installed. Then run the following in the command line:

<manager> activate pannagram
open -a RStudio

One may also create an alias:

alias panR="micromamba activate pannagram && open -a RStudio"

Included Dependencies

The environment provides the following dependencies, each accessible directly via the command line:

R interpreter (required version)
BLAST
MAFFT

Windows users

Can try running code from this repo under WSL (as Bash and / path separator are used extensively in the code). Nevertheless it was never tested in such environment.

1. Pangenome linear alignment

1.1 Building the alignment

Pangenome alignment can be built in two modes:

reference-free:

./pannagram.sh -path_in '<genome files directory path>' \
    -path_out '<output files path>' \
    -cores 8

reference-based:

./pannagram.sh -ref '<reference genome name>' \
    -path_in '<genome files directory path>' \
    -path_out '<output files path>' \
    -cores 8

quick look: If there is no information on genomes and corresponding chromosomes available, one can run preparation steps:

./pannagram.sh -ref '<reference genome name>' \
    -path_in '<genome files directory path>' \
    -path_out '<output files path>' \
    -cores 8 -pre

An extended description of the parameters for all three scripts are avaliable by executing scripts with the flag -help.

1.2 Extract information from the pangenome alignment

Synteny blocks, SNPs, and sequence consensus (for the IGV browser) can be extracted from the alignment:

./analys.sh -path_msa '<output path with consensus>' \
      -path_chr '<path with chromosomes>' \
      -blocks  \  # Find Synteny block inforamtion for visualisation
      -seq  \     # Create consensus sequence of the pangenome
      -snp        # SNP calling

1.3 Calling structural variants

When the pangenome linear alignment is built, SVs can be called using the following script:

./analys.sh -path_msa '<output path with consensus>' \
      -sv_call  \         # Create output .gff and .fasta files with SVs
      -sv_sim te.fasta \  # Compare with a set of sequences (e.g., TEs)
      -sv_graph           # Construct the graph of SVs

2. Visualisation

Pannagram contains a number of useful methods for visualization in R.

2.1 Visualisation of the pangenome alignment

All genomes together:

A dotplot for a pair of genomes:

2.2 Graph of Nestedness on Structural variants

Every node is an SV:

Every node is a unique sequence, size - the amount of this sequence in SVs:

2.3 Nucleotide plot for a fragment of the alignment

In the ACTG-mode:

# --- Quick start code ---
source('utils/utils.R')             # Functions to work with sequences
source('visualisation/msaplot.R')   # Visualisation
aln.seq = readFastaMy('aln.fasta')  # Vector of strings
aln.mx = aln2mx(aln.seq)            # Transfom into the matrix
msaplot(aln.mx)                     # ggplot object

In the Polymorphism mode:

# --- Quick start code ---
msadiff(aln.mx)                     # ggplot object

2.4 Dotplots of Sequences

Simultaneously on forward (dark color) and reverse complement (pink color) strands:

# --- Quick start code ---
source('utils/utils.R')             # Functions to work with sequences
source('visualisation/dotplot.R')   # Visualisation
s = sample(c("A","C","G","T"), 100, replace = T)
dotplot(s, s, 15, 9)                # ggplot object

2.5 ORF-finder and visualisation

# --- Quick start code ---
source('utils/utils.R')             # Functions to work with sequences
source('visualisation/orfplot.R')   # Visualisation
str = nt2seq(s)
orfs = orfFinder(str)
orfplot(orfs$pos)                   # ggplot object

3. Additional useful tools

3.1 Search for similar sequences

...on the genome

The first approach involves searching against entire genomes or individual chromosomes. The quickstart toy-example is:

./simsearch.sh -in_seq genes.fasta -on_genome genome.fasta -out out.txt

The result is a GFF file with hits matching the similarity threshold.

...on another set

The second approach, in contrast, is designed to search for similarities against another set of sequences. The quickstart toy-example is:

./simsearch.sh -in_seq genes.fasta -on_seq genome.fasta -out out.txt

The result is an RDS (R Data Structure) table. This table shows the coverage of one sequence over another and includes a flag column that indicates whether the sequences meet the similarity threshold. Additionally, the second script takes into account the coverage strand, determining not just if a sequence is covered, but also if it's covered in a specific orientation.

Acknowledgements

Development:

Anna Igolkina - Lead Developer and Project Initiator
Alexander Bezlepsky - Assistant

Testing:

Anna Igolkina: Lead Tester
Anna Glushkevich: Testing the alignment on A. lyrata genomes
Elizaveta Grigoreva: Testing the alignment on A. thaliana and A. lyrata genomes
Jilong Ma: Testing the SV-graph on spider genomes
Alexander Bezlepsky: Testing the Pannagram's functionality on Rhizobial genomes
Gregoire Bohl-Viallefond: Testing the annotation converter on A. thaliana alignment

Resources:

Logo was generated with the help of DALL-E
Parallel Processing Tool: O. Tange (2018): GNU Parallel 2018, ISBN 9781387509881, DOI https://doi.org/10.5281/zenodo.1146014.

Name		Name	Last commit message	Last commit date
Latest commit History 1,866 Commits
docs		docs
images		images
inst		inst
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
DEV_BUILD.md		DEV_BUILD.md
README.md		README.md
build.sh		build.sh
developer.sh		developer.sh
meta.yaml		meta.yaml
pannagram.yml		pannagram.yml
pannagram_checks.sh		pannagram_checks.sh
pannagram_m4.yml		pannagram_m4.yml
pannagram_min.yml		pannagram_min.yml
user.sh		user.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pannagram

Setting Up the Working Environment

Prerequisites

Linux and macOS (Intel)

macOS (M-series chips)

Alternative: Setting Up the Environment Without Explicit Versions

Running RStudio with the Environment

Included Dependencies

Windows users

1. Pangenome linear alignment

1.1 Building the alignment

1.2 Extract information from the pangenome alignment

1.3 Calling structural variants

2. Visualisation

2.1 Visualisation of the pangenome alignment

2.2 Graph of Nestedness on Structural variants

2.3 Nucleotide plot for a fragment of the alignment

2.4 Dotplots of Sequences

2.5 ORF-finder and visualisation

3. Additional useful tools

3.1 Search for similar sequences

...on the genome

...on another set

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

iganna/pannagram

Folders and files

Latest commit

History

Repository files navigation

Pannagram

Setting Up the Working Environment

Prerequisites

Linux and macOS (Intel)

macOS (M-series chips)

Alternative: Setting Up the Environment Without Explicit Versions

Running RStudio with the Environment

Included Dependencies

Windows users

1. Pangenome linear alignment

1.1 Building the alignment

1.2 Extract information from the pangenome alignment

1.3 Calling structural variants

2. Visualisation

2.1 Visualisation of the pangenome alignment

2.2 Graph of Nestedness on Structural variants

2.3 Nucleotide plot for a fragment of the alignment

2.4 Dotplots of Sequences

2.5 ORF-finder and visualisation

3. Additional useful tools

3.1 Search for similar sequences

...on the genome

...on another set

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages