Skip to content

Latest commit

 

History

History
69 lines (51 loc) · 2.66 KB

README.md

File metadata and controls

69 lines (51 loc) · 2.66 KB

Strand-seq pipeline

Ongoing work

Preliminary SV calling using Strand-seq data - summarized in a Snakemake pipeline.

Update: We just switched to a re-implementation of the SV classification. Still in the test phase

Bioconda environment

To install the correct environment, you can use Bioconda.

  1. Install MiniConda: In case you do not have Conda yet, it is easiest to just install MiniConda.

  2. Create environment:

```
conda env create -n strandseqnation -f conda-environment.yml
source activate strandseqnation
```

That's it, you are ready to go.

How to use it

  1. Install required software:
* Install [mosaicatcher](https://github.com/friendsofstrandseq/mosaicatcher) (*currently you will need the `develop` branch*)
* Get the R-scripts from [strandsequtils](https://github.com/friendsofstrandseq/strandsequtils)
* Install BSgenome.Hsapiens.UCSC.hg38 (can be skipped of you use the Bioconda environment, see above):
  ```
  source("https://bioconductor.org/biocLite.R")
  biocLite('BSgenome.Hsapiens.UCSC.hg38')
  ```
* [Strand-Phaser](https://github.com/daewoooo/StrandPhaseR) is installed automatically
  1. Set up the configuration of the snakemake pipeline
* Open `Snake.config.json` and specify the path to the executatables
  (such as Mosaicatcher) and to the R scripts.
* Create a subdirectory `bam/` and another subdirectory per sample (e.g.
  `bam/NA12878/`). **Multiple samples can be run together not**.
  Then copy (or soft-link) the Strand-seq single-cell libraries (one BAM
  file per cell) in there. Note that bam files need to be sorted and indexed,
  contain a read group and should have duplicates marked.
  1. Run Snakemake
* run `snakemake` to compute all tasks locally
* Alternatively, you can ask Snakemake to submit your jobs to a HPC cluster. To this end edit the `Snake.cluster.json` file according to your available HPC environment and call

  ```
  snakemake -j 100 \
    --cluster-config Snake.cluster.json \
    --cluster "???"
  ```

SNV calls

The pipeline will run simple SNV calling using samtools and bcftools. If you already have SNV calls, you can avoid that by entering your VCF files into the pipeline. To so, make sure the files are tabix-indexed and specifigy them inside the Snake.config.json file:

"snv_calls"     : {
      "NA12878" : "path/to/snp/calls.vcf.gz"
  },