Skip to content
forked from G100DKFZ/gene-is

GENE-IS is a pipeline for the extraction of integration sites from next-generation sequencing data of clinical and preclinical gene therapy studies.

Notifications You must be signed in to change notification settings

kelsi-kw/gene-is

 
 

Repository files navigation

README

(GENE-IS_1.1-UGX)

Please cite the research article, if you use GENE-IS: Saira Afzal et al., 2016. GENE-IS: time-efficient and accurate analysis of viral integration events in large-scale gene therapy data. Molecular Therapy - Nucleic Acids 2016, vol. 6:133-139. DOI:https://doi.org/10.1016/j.omtn.2016.12.001


GENE-IS is a pipeline for the extraction of integration sites from next-generation sequencing data of clinical and preclinical gene therapy studies. It is specifically designed in order to accept the sequencing reads originated from different protocols like LAM (linear amplification mediated) PCR and Targeted Sequencing (SureSelect/Agilent) methods.

How do I get set up?

Installation

The easiest way to obtain and run gene-is is cloning the present repository

mkdir path_to_location
cd path_to_location
git clone https://github.com/G100DKFZ/gene-is.git
cd gene-is

Testing

To test that the installation was successful we prepared testGenis.sh, a simple script that run a fast analysis on a set of reduced datasets.

  • Type the following command on terminal for changing directory to scripts
cd /path_to_location/gene-is/scripts
# export the location of gene-is
export GENIS=/path_to_location/gene-is
# Run test suite by following command
./testGenis.sh

On the terminal will appear these options:

1) Targeted Sequencing Pair BWA  4) All
2) Targeted Sequencing Single    5) Clear
3) LAM-PCR                       6) Quit
  • To run tests for targeted sequencing paired end mode type at terminal 1 and press enter. If installation was successful following message will appear on the terminal "Targeted Sequencing Pair worked as expected!"

  • To run tests for targeted sequencing single end mode type at terminal 2 and press enter. If installation was successful following message will appear on the terminal "Targeted Sequencing Single-end worked as expected!"

  • To run tests for LAM-PCR paired end mode type at terminal 3 and press enter. If installation was successful following message will appear on the terminal "LAM-PCR Pair worked as expected!"

Dependencies

Third-party tools

GENE-IS depends on several third party tools which are open source and are freely available. All these tools are already provided within the GENE-IS package in $GENIS/tools/bin directory. Using a different version than the ones included may fail the tests, manually check the content of generated and expected files to see if they are overall similar. If so you are probably good to go. This folder is referred as default location for third party tools in the configuration files.

########################################################################
#######                   Third-party tools                      #######
########################################################################
# Path to BWA
aligner=$GENIS/tools/bin/bwa
# Path to secondary aligner. (BLAT or pblat)
blatAligner=$GENIS/tools/bin/blat
# Path to skewer
skewer=$GENIS/tools/bin/skewer
# Path to samtools
samtools=$GENIS/tools/bin/samtools
# Path to bedtools
bedTools=$GENIS/tools/bin/bedtools

For user information names of tools and related links are provided here:

Tool Version URL
BWA 0.7.17 https://sourceforge.net/projects/bio-bwa/
bedtools 2.30 https://github.com/arq5x/bedtools2
samtools 1.13 https://github.com/samtools/samtools
pblat 2.5 https://github.com/icebert/pblat
skewer 0.2.2 https://sourceforge.net/projects/skewer/

Perl modules

The required Perl libraries are pre-packaged within the tool ("lib" dir in GENE-IS).

UCSC annotation tables

Annotation tables to be used can be downloaded from the UCSC Genome Browser. Select the Genes and Gene Predictions group, the NCBI RefSeq track and the UCSC RefSeq (refGene) table. Other tables may be compatible but is untested.

UCSC Genome Browser annotation track

Configuration File

GENE-IS has specific configuration files for each mode of analysis; LAM-PCR, TES paired and TES single end configuration files. Only the relevant configuration file should be modified for particular analysis. For testing GENE-IS installation user does not need to change any parameter in the configuration file. The templates are in the gene-is path; i.e.

$GENIS/configFile_targetedSequencing_pairedEnd.txt

Contacts

Original developpers:

UGX modifications:

About

GENE-IS is a pipeline for the extraction of integration sites from next-generation sequencing data of clinical and preclinical gene therapy studies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 43.9%
  • Perl 36.2%
  • Awk 18.3%
  • Python 1.6%