A question regarding native assembly tool prior to scaffolding with ARCS #175

goors-syntezza · 2024-11-20T09:25:43Z

Hi All,

I'm not sure that here is the right place to ask my questions, but hopefully one of you could advise me. I'm dealing of assembly of a diploid plant genome with estimated genome size of ~9Gbp, and over one billion reads, 100bp PE, with insert size of about 900 (stLFR reads).

I tried to perform the initial assembly of the reads using SOAPDeNovo2, but failed to do so (from the looks it seems to me like the tool is not able to handle such amount of data). I also tried to apply SPADEs (latest version), but it crashed very fast as it quickly filled almost of 2TB of disk workspace (though having 750GB RAM). As for Celera ('WGS genome assembler'), the documentation is scarce.

I wonder whether you could kindly provide me some insight and maybe a tool/strategy suggestion, in order to complete this stage of initial assembly.

Thank you in advance,
Goor

lcoombe · 2024-11-20T16:37:27Z

Hi Goor,

For the initial de novo short read assembly, have you given ABySS (https://github.com/bcgsc/abyss) a try? ABySS uses a Bloom filter de Bruijn graph-based approach, which allows for a lower RAM usage. We have assembled short read datasets for multiple 20 GB spruce genomes using ABySS without any issues.
ABySS won't utilize any linked read information, but you could use it to get your initial assembly, and then run Tigmint/ARCS to correct and scaffold the baseline assembly using the linked read information.

Hope that helps!
Lauren

goors-syntezza · 2024-12-22T19:58:24Z

Hi Lauren,

Thank you for your detailed answer. I was able to assemble the reads using Abyss on a 1.5Gbp genome-sized organism (as a test).

After converting my stLFR reads to BX:Z format, I applied the following syntax:

abyss-pe v=-v k=25 j=7 name=test B=50G in='s1_r1.fq.gz s1_r2.fq.gz' scaffolds

Next I wanted to continue with Tigmint, so I used the syntax:

tigmint-make --trace tigmint metrics draft=test-contigs reads="s1_r1 s1_r2".

For some reason, when I looked at the Tigmint logs, I saw that all reads are treated as single-ended:

[M::process] 808742 single-end sequences; 0 paired-end sequences.

I guess I'm doing something wrong, but I'm not sure as to what.

I would be happy to get an idea as to what is wrong.

Thank you in advance,
Goor

lcoombe · 2024-12-22T20:57:25Z

Hi Goor,

For Tigmint, you need to have your input linked reads in a single, interleaved file, and supply that filename as indicated in the usage page: https://github.com/bcgsc/tigmint?tab=readme-ov-file#usage

For interleaving R1 and R2 short read files, I'd suggest using seqtk mergepe - it's a fast and very useful utility.

Hope that helps!
Lauren

goors-syntezza · 2024-12-29T17:55:37Z

Hi Lauren,

Thank you for your reply. I was able to use seqtk pmerge as suggested by you, followed by running tigmint and consecutively arcs. Then, when I examined assembly statistics on the original Abyss contigs and ACRS' output, the statistics are roughly the same. Hence, I understand, that I did something wrong in the process, as there was no improvement.

My main suspicion is that something with the barcodes didn't go well. To the best of my understanding, after the conversion of stLFR reads' barcodes into the BX header format, there was no need to go through the LongRanger Basic coammnd. Was I wrong?

Thank you in advance,
Goor

warrenlr · 2025-01-14T16:30:25Z

Hi Goor,

It appears that the barcode information isn’t being captured, though I can’t be 100% certain. I strongly recommend taking a subset of reads and inspecting their barcodes for consistency.

If it helps, carefully reviewing and following each step and data transformation outlined in the provided demo could provide valuable insights.

Unfortunately, we’re currently short-staffed and will be for the foreseeable future, so I apologize for not being able to offer additional support at this time.

Best regards,
Rene

lcoombe added the question label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question regarding native assembly tool prior to scaffolding with ARCS #175

A question regarding native assembly tool prior to scaffolding with ARCS #175

goors-syntezza commented Nov 20, 2024

lcoombe commented Nov 20, 2024

goors-syntezza commented Dec 22, 2024

lcoombe commented Dec 22, 2024

goors-syntezza commented Dec 29, 2024

warrenlr commented Jan 14, 2025

A question regarding native assembly tool prior to scaffolding with ARCS #175

A question regarding native assembly tool prior to scaffolding with ARCS #175

Comments

goors-syntezza commented Nov 20, 2024

lcoombe commented Nov 20, 2024

goors-syntezza commented Dec 22, 2024

lcoombe commented Dec 22, 2024

goors-syntezza commented Dec 29, 2024

warrenlr commented Jan 14, 2025