Snakemake pipeline for processing ATACseq PE data.
- Clone this repository in the working directory
git clone https://github.com/luostrowski/atacseq_snakemake.git
- Navigate into the data/fastq directory
cd data/fastq
- Symlink files to the directory
ln -s /absolute/path/to/file_R1.fq.gz .
ln -s /absolute/path/to/file_R2.fq.gz .
- Rename symlink to proper format
mv file_R1.fq.gz sample_R1.fastq.gz
mv file_R2.fq.gz sample_R2.fastq.gz
The proj_config.yaml
file contains the wildcards that the script will use.
-
Change the path to the reference genome (it must be indexed).
-
Change the path to the annotation gtf file.
-
Adjust treat samples if necessary.
-
Change trimmomatic adapters if necessary. Default is Nextera PE, usually works with ATAC-seq.
-
Change the mitochondrial genome name (
mt_chr
) if necessary. Options are: "chrM" or "MT". -
Adjust
effective_genome_size
for deeptools bamcoverage:
- GRCh37: 2864785220
- GRCh38: 2913022398
- GRCm37: 2620345972
- GRCm38: 2652783500 -
Adjust other parameters if necessary:
macs2_genome
: two letter code for MACS2 to recognize genomecontrasts
: desired contrasts for differential analysisanalysis_method
: for diffbind, either DBA_DESEQ2 or DBA_EDGERmin_overlap
: for diffbind merging, min number of samples peak found to retain itfdr_cutoff
: padj cutoff for defining significant diff peaksdata_source
: for chipseeker, genome usedorganism
: for chipseeker, organism latin name with underscore (no spaces)
- First try a dry run to make sure the structure works.
snakemake --dry-run
-
Adjust the
cluster.json
file to run on a cluster submitting slurm jobs.- The
--latency-wait 60
parameter is required when using slurm to make sure outfiles are complete.
- The
snakemake --use-conda --jobs 100 --latency-wait 60 --cluster-config cluster.json --cluster "sbatch --qos {cluster.qos} -p {cluster.partition} -N {cluster.nodes} -n {cluster.cores} --mem {cluster.mem} -t {cluster.time} -o {cluster.stdout} -e {cluster.stderr}"
- If it is necessary to re-run the script, the incomplete files will need to be re-generated.
snakemake --rerun-incomplete --use-conda --jobs 100 --latency-wait 60 --cluster-config cluster.json --cluster "sbatch --qos {cluster.qos} -p {cluster.partition} -N {cluster.nodes} -n {cluster.cores} --mem {cluster.mem} -t {cluster.time} -o {cluster.stdout} -e {cluster.stderr}"