-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck at simulation stage #193
Comments
Thanks for reporting this @kainth-amoldeep @kmnip I do not think it is related to the infinite loop bug you mentioned (#184) because, in the transcriptome mode, NanoSim first selects a reference transcriptome based on the expression profiles and then based on the 2 dimensional KDE distributions, it picks an aligned length for the simulation. Therefore, max and min lengths are not used. They are only used in generating unaligned reads I believe. That being said, dear @kainth-amoldeep would you mind trying again not using the min and max parameters and see if it works or not? In addition, you said you ensured that the transcript ids in the expression file start with "ENS". Can you also double-check if they are the same in your reference transcriptome file? Did you calculate the expression profiles using the module provided by NanoSim? I mean did you run the NanoSim -quantify module? |
@SaberHQ Good points! @kainth-amoldeep I suggest you can also try our development code on the master branch. We have had a number of bugfixes since v3.0.0. |
Hi, |
Hi, I am trying to use NanoSim 3.0.0. (downloaded source code from (https://github.com/bcgsc/NanoSim/releases/tag/v3.0.0)) for simulating ONT reads. I created my own profile using:
./read_analysis.py transcriptome -i ont_run.fq -rg ucsc_hg38.fasta -rt hg38_transcriptome_gff.fa -annot hg38_transcriptome.gff3 -t 38
. It seems to run fine with the following output:2023-05-30 15:21:23: Read pre-process and unaligned reads analysis
2023-05-30 15:21:39: Alignment with minimap2 to reference transcriptome
2023-05-30 15:24:33: Alignment with minimap2 to reference genome
2023-05-30 16:00:19: Processing transcriptome alignment file: sam
2023-05-30 16:00:44: Processing genome alignment file: sam
2023-05-30 16:01:14: Parse the annotation file (GTF/GFF3)
2023-05-30 16:01:47: Modeling Intron Retention
2023-05-30 16:01:47: Reading intron coordinates from GFF file
2023-05-30 16:02:30: Read primary genome alignment for each read
2023-05-30 16:07:03: Read primary transcriptome alignment for each read
2023-05-30 16:09:49: Calculating probabilities for each intron retention event
2023-05-30 16:16:23: Aligned reads analysis
2023-05-30 16:16:40: Computing KDEds processed >> 1215001
2023-05-30 16:16:40: Computing 2D KDE for transcriptome ref length
2023-05-30 16:16:41: Computing KDE for aligned region
2023-05-30 16:16:41: Computing KDE for aligned reads
2023-05-30 16:16:42: Computing KDE for unaligned length
2023-05-30 16:16:42: Computing KDE for ht ratio
2023-05-30 16:16:43: Unaligned reads analysis
2023-05-30 16:16:43: match and error models
2023-05-30 16:19:13: Model fitting
2023-05-30 16:19:18: Mismatch fitting start
Mismatch parameters: 0.10299333509559308 0.787357964367962 0.2182370542984321 Residual 6.509615946814762e-05
2023-05-30 16:19:48: Mismatch fitting done
2023-05-30 16:19:48: Insertion fitting start
WARNING! Insertion parameters may not be optimal! 0.8374719781176181 0.9192989587536269 0.008942175282069826 0.9880211696155594 Residual 0.0030656834500610852
2023-05-30 16:21:14: Insertion fitting done
2023-05-30 16:21:14: Deletion fitting start
WARNING! Deletion parameters may not be optimal! 0.9569706479182295 1.032499061802694 0.2866297773531245 0.8752291450180834 Residual 0.0005721754355394459
2023-05-30 16:22:00: Deletion fitting done
2023-05-30 16:22:00: Finished!
But when I try to run simulator using
./simulator.py transcriptome -rt hg38_transcriptome_gff.fa -rg ucsc_hg38.fasta -c training -e expression_abundance.tsv -n 20000 -max 10000 -min 200 -b guppy -r cDNA_1D --fastq -o simulated_1D
it gets stuck at:2023-05-31 09:48:19: Read in reference
2023-05-31 09:48:21: Read in expression profile
2023-05-31 09:48:21: Read in reference genome and create .fai index file
2023-05-31 09:48:21: Read in IR markov model
2023-05-31 09:48:21: Read in GFF3 annotation file
2023-05-31 09:48:41: Read error profile
2023-05-31 09:48:41: Read KDF of unaligned reads
2023-05-31 09:48:41: Read KDF of aligned reads
2023-05-31 09:48:41: Read chimeric simulation information
2023-05-31 09:48:41: Start simulation of aligned reads
I have ensured that my expression_abundance.tsv file have transcript_id with "ENS".
I would really appreciate if you can help me with this. Thanks.
The text was updated successfully, but these errors were encountered: