Options / suggestions for how to simulate nCats data? #200

tfenne · 2023-11-29T21:42:29Z

Hi - I'm trying to simulate data similar to that generated by the nCATS protocol.

What this means is that I would like to be able to specify e.g. one or more small regions (on the order of 1-50bp) where all reads should start, rather than start positions being randomly distributed throughout the genome.

I don't see any options to constrain the read locations, so I'm thinking that what I'll have to do is:
i) Generate small FASTA files that start where I want reads to start and extend for 100-200kb
ii) Simulate a lot of reads from that file
iii) Filter the simulated reads to only those that start within the region I want

I'm guessing (ii) and (iii) will be rather slow, and I'm wondering if you have better suggestions for how to proceed? Thanks!

SaberHQ · 2023-12-05T20:21:57Z

Thank you @tfenne for using NanoSim.

NanoSim currently does not have such a feature. It would be interesting to explore adding that in future releases. However, I can not give you a guaranteed answer whether or not we will work on it and an approximate timeframe for it.

In the meantime, I would suggest you follow the approach you suggested, generating a lot of reads and then filtering them based on their location. NanoSim is fairly fast in generating reads and you should be able to get millions of reads generated within a day.

I will keep you updated on this.
Thanks, Saber.

SaberHQ added the feature request label Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options / suggestions for how to simulate nCats data? #200

Options / suggestions for how to simulate nCats data? #200

tfenne commented Nov 29, 2023

SaberHQ commented Dec 5, 2023

Options / suggestions for how to simulate nCats data? #200

Options / suggestions for how to simulate nCats data? #200

Comments

tfenne commented Nov 29, 2023

SaberHQ commented Dec 5, 2023