simulator.py taking very long to run and RAM usage above 768 GB #76

andrese52 · 2019-12-29T22:19:14Z

I did the characterization with E. coli and an SRA run from NCBI. Then, I used that generated profile in simulator.py. Everything works well if no -med and -sd are used. However, when I want a median of 8000 and sd of 200, the simulation gets stuck and takes very long. After a few hours, it uses all RAM and the job is killed by our HPC scheduler.

See below the code being used:

simulator.py genome -n 2700 -med 8000 -sd 200 -r test-10kb.fasta -o genome-10kb -c nanosim_profile_new/ecoli --seed 974839895 -t 32

Any advice is greatly appreciated.

The text was updated successfully, but these errors were encountered:

cheny19 · 2019-12-30T19:15:41Z

Hi Andres,

The problem is with sd. The sd is the sd of log normal distribution, instead of the whole distribution. So you will need to convert it according to wiki. if you sd is too large, it will generate some extremely long or short sequences, and then will be discarded because they are longer than the genome size or smaller than the minimum threshold.

Let me know if you have further questions.

Chen

andrese52 · 2020-01-07T04:05:52Z

Hi Chen,
Yes, may you please provide a working example in such cases? The default examples in the README.md do not include -med or -sd.

Say we want a median of 8000, what -sd would you suggest when having a genome size of 10kb to be simulated?

Thank you
Andres

cheny19 · 2020-01-22T22:47:17Z

Sorry for the late reply. The standard deviation is independent of genome size, and it purely depends on how much you want the reads to spread. I'd suggest -sd to be 1.05 or 1.1 to start with.

HLHsieh · 2023-03-30T22:05:47Z

Hi @cheny19,

I also had this similar issue. Compared to default setting, simulator.py taking very long to run in the setting of -med 20000 -sd 4. I am trying to stimulate reads with median=20kb and std=10kb. I would appreciate it if you could advise.

Many thanks,
Hsin

kmnip · 2023-03-31T06:53:42Z

@HLHsieh
Can you please report your exact command?

HLHsieh · 2023-03-31T20:24:04Z

@kmnip

I executed the following

~/bin/NanoSim/src/simulator.py genome -rg ~/mock_genome/D4Z4_p1.fasta -c ~/bin/NanoSim/pre-trained_models/human_NA12878_DNA_FAB49712_guppy/training -t 20 -n 2000000 -o D4Z4_p1_NanoSim_100x -med 20000 -sd 4 --seed 100 -b guppy

My goal is to simulate reads with distribution of median=20kb and std=10kb.

I also tried to execute that command with the default value of median and std, and it went smoothly.

~/bin/NanoSim/src/simulator.py genome -rg ~/mock_genome/D4Z4_p1.fasta -c ~/bin/NanoSim/pre-trained_models/human_NA12878_DNA_FAB49712_guppy/training -t 20 -n 2000000 -o D4Z4_p1_NanoSim_100x --seed 100 -b guppy

Please advise. Thanks!

HLHsieh · 2024-07-10T19:08:08Z

Hi @kmnip,

I would like to follow up on this issue. Any suggestions would be appreciated.

PS. My version is 3.1.0.

Best,
Hsin

kmnip · 2024-07-11T03:08:01Z

@HLHsieh Let's continue in your other thread:
#210

cheny19 closed this as completed Jun 9, 2020

HLHsieh mentioned this issue Mar 31, 2023

Option to specify desired read coverage or sequencing depth #188

Open

dlaehnemann mentioned this issue Feb 3, 2025

simulating nanopore samples with a fixed mean read length (to achieve a wanted coverage value) #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simulator.py taking very long to run and RAM usage above 768 GB #76

simulator.py taking very long to run and RAM usage above 768 GB #76

andrese52 commented Dec 29, 2019

cheny19 commented Dec 30, 2019

andrese52 commented Jan 7, 2020

cheny19 commented Jan 22, 2020

HLHsieh commented Mar 30, 2023

kmnip commented Mar 31, 2023

HLHsieh commented Mar 31, 2023

HLHsieh commented Jul 10, 2024

kmnip commented Jul 11, 2024

simulator.py taking very long to run and RAM usage above 768 GB #76

simulator.py taking very long to run and RAM usage above 768 GB #76

Comments

andrese52 commented Dec 29, 2019

cheny19 commented Dec 30, 2019

andrese52 commented Jan 7, 2020

cheny19 commented Jan 22, 2020

HLHsieh commented Mar 30, 2023

kmnip commented Mar 31, 2023

HLHsieh commented Mar 31, 2023

HLHsieh commented Jul 10, 2024

kmnip commented Jul 11, 2024