Skip to content

Reference guided assembly with reference selection

Victoria Cepeda edited this page Sep 1, 2018 · 1 revision

Reference-guided assembly with reference selection.

Download and extract metagenomic sample:

ftp://public-ftp.hmpdacc.org/Illumina/posterior_fornix/SRS044742.tar.bz2

SRS044742/
    SRS044742.denovo_duplicates_marked.trimmed.1.fastq
    SRS044742.denovo_duplicates_marked.trimmed.2.fastq
    SRS044742.denovo_duplicates_marked.trimmed.singleton.fastq

Run:

 python3 go_metacompass.py -P SRS044742/SRS044742.denovo_duplicates_marked.trimmed.1.fastq,SRS044742/SRS044742.denovo_duplicates_marked.trimmed.2.fastq -U SRS044742/SRS044742.denovo_duplicates_marked.trimmed.singleton.fastq -o SRS044742_2018 -k

Notice that this time we added a new parameter "-k". This parameter will add intermediate file to the final output. You will see the following messages while running metacompass:

/cbcb/software/Linux-x86_64/packages/ncbi-blast-2.4.0+/bin/blastn
/cbcb/project2-scratch/treangen/kmer/kmer-mask
/cbcb/sw/RedHat-7-x86_64/users/treangen/local/mash/1.1.1/bin/mash
/cbcb/sw/RedHat-7-x86_64/common/local/Python3/common/3.6.0/bin/snakemake
Provided cores: 12
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	1	assemble_unmapped
	1	bam_sort
	1	bowtie2_map
	1	build_contigs
	1	create_tsv
	1	fastq2fasta
	1	join_contigs
	1	kmer_mask
	1	merge_reads
	1	pilon_contigs
	1	pilon_map
	1	reference_recruitment
	1	sam_to_bam
	14
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	merge_reads
Selected jobs (1):
	merge_reads
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---merge fastq reads
Reason: Missing output files: SRS044742_2018/SRS044742.merged.fq

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 12.
1 of 14 steps (7%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	kmer_mask
Selected jobs (1):
	kmer_mask
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---kmer-mask fastq
Reason: Missing output files: SRS044742_2018/SRS044742.marker.match.1.fastq; Input files updated by another job: SRS044742_2018/SRS044742.merged.fq

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 13.
2 of 14 steps (14%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	fastq2fasta
Selected jobs (1):
	fastq2fasta
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---Converting fastq to fasta.
Reason: Missing output files: SRS044742_2018/SRS044742.fasta; Input files updated by another job: SRS044742_2018/SRS044742.marker.match.1.fastq

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 11.
3 of 14 steps (21%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	reference_recruitment
Selected jobs (1):
	reference_recruitment
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---reference recruitment.
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/mc.refseq.fna; Input files updated by another job: SRS044742_2018/SRS044742.fasta

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 8.
4 of 14 steps (29%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	bowtie2_map
Selected jobs (1):
	bowtie2_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Build index .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.sam; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/mc.refseq.fna, SRS044742_2018/SRS044742.merged.fq

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 10.
5 of 14 steps (36%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	build_contigs
Selected jobs (1):
	build_contigs
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---Build contigs .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/contigs.fasta; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/mc.refseq.fna, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.sam

Skipped removing non-empty directory SRS044742_2018/SRS044742.0.assembly.out
Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 6.
6 of 14 steps (43%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	pilon_map
Selected jobs (1):
	pilon_map
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Map reads for pilon polishing.
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.unmapped.2.fq, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc_unpaired.sam, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.unmapped.1.fq; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/contigs.fasta

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 7.
7 of 14 steps (50%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (2):
	sam_to_bam
	assemble_unmapped
Selected jobs (1):
	sam_to_bam
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---Convert sam to bam .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.bam; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc_unpaired.sam, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 9.
8 of 14 steps (57%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (2):
	bam_sort
	assemble_unmapped
Selected jobs (1):
	bam_sort
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Sort bam .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/sorted.bam, SRS044742_2018/SRS044742.0.assembly.out/sorted2.bam; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.bam

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 5.
9 of 14 steps (64%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (2):
	pilon_contigs
	assemble_unmapped
Selected jobs (1):
	assemble_unmapped
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Assemble unmapped reads .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.megahit/final.contigs.fa; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.unmapped.2.fq, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.mc.sam.unmapped.1.fq

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 3.
10 of 14 steps (71%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	pilon_contigs
Selected jobs (1):
	pilon_contigs
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

---Pilon polish contigs .
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/contigs.pilon.fasta; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/sorted.bam, SRS044742_2018/SRS044742.0.assembly.out/sorted2.bam, SRS044742_2018/SRS044742.0.assembly.out/contigs.fasta

Releasing 12 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 2.
11 of 14 steps (79%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	join_contigs
Selected jobs (1):
	join_contigs
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---concanenate reference-guided and de novo contigs
Reason: Missing output files: SRS044742_2018/SRS044742.0.assembly.out/contigs.final.fasta; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/contigs.pilon.fasta, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.megahit/final.contigs.fa

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 0.
12 of 14 steps (86%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	create_tsv
Selected jobs (1):
	create_tsv
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

---information reference-guided and de novo contigs
Reason: Missing output files: SRS044742_2018/metacompass_summary.tsv; Input files updated by another job: SRS044742_2018/SRS044742.0.assembly.out/mc.refseq.fna, SRS044742_2018/SRS044742.0.assembly.out/SRS044742.megahit/final.contigs.fa, SRS044742_2018/SRS044742.0.assembly.out/contigs.final.fasta, SRS044742_2018/SRS044742.0.assembly.out/contigs.pilon.fasta, SRS044742_2018/SRS044742.0.assembly.out/contigs.fasta

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 4.
13 of 14 steps (93%) done
Resources before job selection: {'_cores': 12, '_nodes': 9223372036854775807}
Ready jobs (1):
	all
Selected jobs (1):
	all
Resources after job selection: {'_cores': 11, '_nodes': 9223372036854775806}

localrule all:
    input: SRS044742_2018/metacompass_summary.tsv
    jobid: 1
    reason: Input files updated by another job: SRS044742_2018/metacompass_summary.tsv

Releasing 1 _cores (now 12).
Releasing 1 _nodes (now 9223372036854775807).
Finished job 1.
14 of 14 steps (100%) done
unlocking
removing lock
removing lock
removed all locks
mv: cannot stat ‘SRS044742_2018/SRS044742.0.assembly.out/*merged.fq.mash*’: No such file or directory
checking for dependencies (Bowtie2, Blast, kmermask, Snakemake, etc)
Bowtie2--->[OK]
Blast+--->[OK]
kmer-mask--->[OK]
mash--->[OK]
Snakemake--->[OK]
MetaCompass finished succesfully!

The output will be in the specified folder example1_output:

ls SRS044742_2018/*

SRS044742_2018/metacompass.final.ctg.fa

SRS044742_2018/intermediate_files:
assembly_output
mapped_reads
megahit_output
pilon_output
reference_selection_output
unmapped_reads

SRS044742_2018/metacompass_logs:
SRS044742.0.bowtie2map.log
SRS044742.0.kmermask.log
SRS044742.0.megahit.log
SRS044742.0.pilon.map.log
SRS044742.0.reference_recruitement.log

SRS044742_2018/metacompass_output:
metacompass_assembly_stats.tsv
metacompass.final.ctg.fa
metacompass_summary.tsv