-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A question regarding native assembly tool prior to scaffolding with ARCS #175
Comments
Hi Goor, For the initial de novo short read assembly, have you given ABySS (https://github.com/bcgsc/abyss) a try? ABySS uses a Bloom filter de Bruijn graph-based approach, which allows for a lower RAM usage. We have assembled short read datasets for multiple 20 GB spruce genomes using ABySS without any issues. Hope that helps! |
Hi Lauren, Thank you for your detailed answer. I was able to assemble the reads using Abyss on a 1.5Gbp genome-sized organism (as a test). After converting my stLFR reads to BX:Z format, I applied the following syntax:
Next I wanted to continue with Tigmint, so I used the syntax:
For some reason, when I looked at the Tigmint logs, I saw that all reads are treated as single-ended: [M::process] 808742 single-end sequences; 0 paired-end sequences. I guess I'm doing something wrong, but I'm not sure as to what. I would be happy to get an idea as to what is wrong. Thank you in advance, |
Hi Goor, For Tigmint, you need to have your input linked reads in a single, interleaved file, and supply that filename as indicated in the usage page: https://github.com/bcgsc/tigmint?tab=readme-ov-file#usage For interleaving R1 and R2 short read files, I'd suggest using Hope that helps! |
Hi Lauren, Thank you for your reply. I was able to use seqtk pmerge as suggested by you, followed by running tigmint and consecutively arcs. Then, when I examined assembly statistics on the original Abyss contigs and ACRS' output, the statistics are roughly the same. Hence, I understand, that I did something wrong in the process, as there was no improvement. My main suspicion is that something with the barcodes didn't go well. To the best of my understanding, after the conversion of stLFR reads' barcodes into the BX header format, there was no need to go through the LongRanger Basic coammnd. Was I wrong? Thank you in advance, |
Hi Goor, It appears that the barcode information isn’t being captured, though I can’t be 100% certain. I strongly recommend taking a subset of reads and inspecting their barcodes for consistency. If it helps, carefully reviewing and following each step and data transformation outlined in the provided demo could provide valuable insights. Unfortunately, we’re currently short-staffed and will be for the foreseeable future, so I apologize for not being able to offer additional support at this time. Best regards, |
Hi All,
I'm not sure that here is the right place to ask my questions, but hopefully one of you could advise me. I'm dealing of assembly of a diploid plant genome with estimated genome size of ~9Gbp, and over one billion reads, 100bp PE, with insert size of about 900 (stLFR reads).
I tried to perform the initial assembly of the reads using SOAPDeNovo2, but failed to do so (from the looks it seems to me like the tool is not able to handle such amount of data). I also tried to apply SPADEs (latest version), but it crashed very fast as it quickly filled almost of 2TB of disk workspace (though having 750GB RAM). As for Celera ('WGS genome assembler'), the documentation is scarce.
I wonder whether you could kindly provide me some insight and maybe a tool/strategy suggestion, in order to complete this stage of initial assembly.
Thank you in advance,
Goor
The text was updated successfully, but these errors were encountered: