PDX Mouse Subtraction Pipeline

Authors:	Oliver Hampton Chase Miller Liu Xi Maria Cardenas and Komal S. Rathi
Contact:	Komal S. Rathi ([email protected])
Organization:	DBHi, CHOP
Status:	Completed
Date:	2025-02-04

Introduction

The goal of this repo is to make the Mouse subtraction pipeline from BCM (Wheeler Lab) reproducible.

Installation

Create python3 environment:

conda create --name pdx-subtract-env
conda activate pdx-subtract-env
conda install -c bioconda samtools
conda install -c bioconda htslib
conda install -c bioconda sambamba
conda install -c bioconda picard
conda install -c bioconda cufflinks
conda install -c anaconda java-1.7.0-openjdk-cos6-x86_64 # required by rna-seqc
conda install -c bioconda rna-seqc
conda install -c bioconda htseq
conda install -c bioconda star=2.5.3a
conda install -c bioconda trinity=2.5.1 # required by star-fusion
conda install -c bioconda star-fusion=1.1.0
conda install -c bioconda bwa
conda install -c bioconda alignstats
conda config --add channels https://conda.anaconda.org/dranew
conda install defuse
conda install -c bioconda bamutil

Create python2 environment (STAR-Fusion v1.0.1 is python 2.7 compatible):

conda create --name star-fusion-env python=2.7
source activate star-fusion-env
conda install -c bioconda star-fusion
conda install -c bioconda trinity
conda install -c conda-forge -c bioconda samtools bzip2
conda install -c conda-forge configparser

# install some non-standard perl modules:
perl -MCPAN -e shell
install DB_File
install URI::Escape
install Set::IntervalTree
install Carp::Assert
install JSON::XS

SOAPfuse

# SOAPfuse has to be installed separately as it is not available on conda
wget https://sourceforge.net/projects/soapfuse/files/SOAPfuse_Package/SOAPfuse-v1.26.tar.gz
tar -xzf SOAPfuse-v1.26.tar.gz
cd SOAPfuse-v1.26

# get SOAPfuse database
cd /mnt/isilon/cbmi/variome/reference/soapfuse_db
wget http://public.genomics.org.cn/BGI/soap/SOAPfuse/hg19-GRCh37.59.for.SOAPfuse.tar.gz

# update SOAPfuse config file according to http://soap.genomics.org.cn/soapfuse.html
# add cytoBand file from ucsc and update SOAPfuse config
wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz hg19-GRCh37.59/
gunzip cytoBand.txt.gz

# change PA_all_fq_postfix in config file to .fq

deFUSE

# for deFUSE, python 2 is required so use the python2 environment created for STAR-Fusion

# Install via source:
wget https://bitbucket.org/dranew/defuse/get/0f198c242b82.zip
unzip 0f198c242b82.zip

# in the tools directory, download boost
cd tools && wget https://dl.bintray.com/boostorg/release/1.68.0/source/boost_1_68_0.tar.gz
tar -zxvf boost_1_68_0.tar.gz
export CPLUS_INCLUDE_PATH=/mnt/isilon/maris_lab/target_nbl_ngs/PPTC-PDX-genomics/mouse_subtraction_pipeline/scripts/dranew-defuse-0f198c242b82/tools/boost_1_68_0
cd tools && make

# download deFUSE reference database
# change perl in defuse_create_ref.pl to /usr/bin/env perl
defuse_create_ref.pl -d /mnt/isilon/cbmi/variome/reference/defuse_db/hg19/

GATK

# get reference files and prepare corresponding index files
wget ftp://[email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz
gunzip 1000G_phase1.indels.b37.vcf.gz
bgzip 1000G_phase1.indels.b37.vcf
tabix -p vcf 1000G_phase1.indels.b37.vcf.gz

wget ftp://[email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
gunzip Mills_and_1000G_gold_standard.indels.b37.vcf.gz
bgzip Mills_and_1000G_gold_standard.indels.b37.vcf
tabix -p vcf 1000G_phase1.indels.b37.vcf.gz

wget ftp://[email protected]/bundle/b37/dbsnp_138.b37.vcf.gz
gunzip dbsnp_138.b37.vcf.gz
bgzip dbsnp_138.b37.vcf
tabix -p vcf dbsnp_138.b37.vcf.gz

Download reference files:

# run this code to create output directories and download reference data
bash mkdirs.sh

Prepare reference fasta and gtf:

~~# Code to prepare reference fasta and gtf (this might be inaccurate because I got the reference files from BCM):~~ ~~bash scripts/generate_ref.sh~~

# make sure all reference fasta files are indexed:
samtools faidx <file.fasta|file.fa>

# make sure the fasta reference used by bwa is indexed:
bwa index protein_coding_canonical.T_chr.fa

BCM-specific scripts and software:

1. pindel_0.2.5b5_tdonly
2. ERCCPlot.jar
3. RnaSeqLimsData.pl

Steps to run the RNA-pipeline:

The RNA pipeline is divided into four steps:

Snakefile_Phase1: Align PDX RNA-seq data to hybrid genome, split into human and mouse bams and create human specific fastq files.
Snakefile_Phase2: Realign to human reference, do QC, run htseq and pindel.
Snakefile_fusions_py2: Run python2 dependent fusion callers like STAR-Fusion and deFUSE
Snakefile_soapfuse: Run python3 dependent fusion caller like SOAPfuse

Each snakefile has a corresponding bash script to run the pipeline:

# Run phase 1
cd rna-hybrid && bash run_phase1.sh

# Run phase 2
cd rna-hybrid && bash run_phase2.sh

# Run python2 based fusion callers
cd rna-hybrid && bash run_fusions_py2.sh

# Run python3 based fusion callers
cd rna-hybrid && bash run_soapfuse.sh

Steps to run the DNA-pipeline:

cd dna-pipeline && bash run_dna.sh

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
dna-pipeline		dna-pipeline
rna-hybrid		rna-hybrid
.gitignore		.gitignore
README.rst		README.rst
getdata.sh		getdata.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDX Mouse Subtraction Pipeline

Introduction

Installation

SOAPfuse

deFUSE

GATK

Download reference files:

Prepare reference fasta and gtf:

BCM-specific scripts and software:

Steps to run the RNA-pipeline:

Steps to run the DNA-pipeline:

About

Releases

Packages

Languages

marislab/pdx-mouse-subtraction

Folders and files

Latest commit

History

Repository files navigation

PDX Mouse Subtraction Pipeline

Introduction

Installation

SOAPfuse

deFUSE

GATK

Download reference files:

Prepare reference fasta and gtf:

BCM-specific scripts and software:

Steps to run the RNA-pipeline:

Steps to run the DNA-pipeline:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages