Pipeline for detecting neoantigen from snvs and indels
conda install -c bioconda optitype
Change the path for OptiPathPipeline.py in script neoscan.pl to where you install optitype
perl neoscan.pl --rdir --log --bamfq --bed --rna --refdir --step <step_number>
<rdir> = full path of the folder holding files for this sequence run
<log> = full path of the folder saving log files
<bamfq> = 1, input is bam; 0, input is fastq: default 1
<rna> =1, input data is rna, otherwise is dna. For HLA genotype
<bed> = bed file for annotation: ensembl: /gscmnt/gc2518/dinglab/scao/db/ensembl38.85/proteome-first.bed
refseq: /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29/proteome.bed
<refdir> = ref directory: /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29
<step_number> run this pipeline step by step. (running the whole pipeline if step number is 0)
-
vcf file format for snvs with columns: chromosome, start position, ref allele, alt allele, gene hugo symbol, HGSV short, is it somatic or germline mutation. Filename: .snp.vcf
1 113854971 C G PTPN22 p.E207Q Somatic 1 113900168 C A AP4B1 p.A284S Somatic 1 117623561 C G FAM46C p.I231M Somatic
-
vcf file for indels with the same columns. Filename .indel.vcf
3 161086280 T - B3GALNT1 p.T159Pfs*8 Somatic
-
RNA-Seq or exome bam or fastq file for HLA type
All three input files should be in one folder. One set of files per sample
-
Copy the tool to a folder. This command will create neoscan/ folder in your current folder:
-
git clone https://github.com/ding-lab/neoscan.git
This is required to be able to do git checkout, Needed just once:
LSF_DOCKER_PRESERVE_ENVIRONMENT=true bsub -Is -R "select[mem>15000] rusage[mem=15000]" -M 32000000 -q docker-interactive -a "docker(scao/dailybox)" /bin/bash
- install optitype
conda install -c bioconda optitype
Change the path for OptiPathPipeline.py in script neoscan.pl to where you install optitype
To prepare vcf and bam input files, you can follow the example at /gscmnt/gc2524/dinglab/akarpova/cptac3/CCRCC_neoscan_test.
perl /gscmnt/gc2524/dinglab/akarpova/software/neoscan/neoscan.pl --rdir /gscmnt/gc2524/dinglab/akarpova/cptac3/CCRCC_neoscan_test --log /gscmnt/gc2524/dinglab/akarpova/cptac3 --bamfq 1 --bed /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29/proteome.bed --rna 1 --refdir /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29 --step 1
Then change --step 2/3/4/5
After finishing running step 5, you can get the final result in the followint two files:
SAMPLE.neo.snv.summary
SAMPLE.neo.indel.summary
Then you may want to filter out peptides found in human cells in general. Just grep every single peptide in this database /gscmnt/gc2518/dinglab/scao/db/ensembl38.85/Homo_sapiens.GRCh38.pep.all.fa.cleaned.fa
Song Cao, [email protected]