Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

details on usage #6

Open
apbiomol opened this issue Apr 22, 2024 · 15 comments
Open

details on usage #6

apbiomol opened this issue Apr 22, 2024 · 15 comments

Comments

@apbiomol
Copy link

Hi there,

I am trying to use your program, but the usage described in your paper is not very detailed.
Can you provide an example of configuratio file? What is cfg file?
Below is my config file for dRNAmain.py, but it raised issue

SyntaxError in line 14 of dRNAmain.py:
invalid syntax

config file

EXA = "test1"
cfgfile = "cfg file for basecall"
ref="/path-to-file/FungiDB-52_FgraminearumPH-1_AnnotatedTranscripts.fasta"
fast5="/path-to-folder/S0R1/fast5_pass"

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 23, 2024

Hi apbiomol !
Cfg file is a config file published by ONT official, which is used for basecall.
You can find them in install_path_of_guppy/barcoding.
The cfg you need to specific is based on the type of chips you used.
Thanks!

@apbiomol
Copy link
Author

Sorry for bothering. But, can you provide me with any example of cfg file? My samples have been seqeunced by a company and I do not have that information. I have following information.

Flow Cell Type FLO-MIN106
Kit SQK-RNA002
...
...
MinKNOW 21.10.8
MinKNOW Core 4.4.3
Bream 6.3.5
Guppy 5.0.17

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 23, 2024

All right.
We first should know that the cfg is located in the dictionary where Guppy is installed.
So you should first download Guppy from ONT official and install in your terminal. Then, find the cfg in /barcoding
Base from your information, I guess may be the cfg you need is called 'rna_r9.4.1_70bps_hac'
choose either rna_r9.4.1_70bps_hac.cfg(high accuracy but slow)or rna_r9.4.1_70bps_fast.cfg(fast but low accuracy)is OK!

@apbiomol
Copy link
Author

Thanks for your information. I downloaded and located the config file.
My "configForMainTail.yaml" file looks like this.

EXA ="test1"
cfgfile ="/opt/ont/guppy/data/rna_r9.4.1_70bps_hac.cfg"
ref="/WorkDB/oxford_nanopore_DRS/nanocompore_test/AnnotatedTranscripts.fasta"
fast5="/WorkDB/RNA_seq_oxford_nanopore/20210708_ONT_DRS_rawdata/S0R1/fast5_pass/"

and ran the snakemeke command "snakemake -s dRNAmain.py"

However, still got an error like


SyntaxError in line 14 of dRNAmain.py:
invalid syntax

Did I do something wrong? The path to fast5 is for a folder, right? For the ref and cfgfile, I put path to the files.
Thanks for your help!

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 23, 2024

Sorry for error :(
I have checked the dictionary since it is a little bit messy.
The new updated dRNAXXX.py is located in /script, not the home.
By the way, the format of cfg has been updated to

EXA: "test1"
cfgfile: "cfg file for basecall"

you can use /script dRNAmain and tail, while using other scripts in Home

@apbiomol
Copy link
Author

Thanks for your help. It looks like config issue has been resolved.

But, I encountered another error as below :(

NameError in line 29 of /WorkDB/wonyong/oxford_nanopore_DRS/FastdRNA/dRNAmain.py:
name 'directory' is not defined
File "/WorkDB/wonyong/oxford_nanopore_DRS/FastdRNA/dRNAmain.py", line 29, in

line 29 is .... directory("./{example}/analysis/slow5/midbolw5_dir")

I downloaded binaries for f5c and nanopolish. Should I put my directory for f5c in line 29?

Sorry for keep bothering you. But, I'd like to have this analysis done, using this program.

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 24, 2024

That might not be the issue caused by f5c.
It might be the bug of snakemake since I ran the test module successfully.
Sometimes snakemake might meet some bug when creating a directory.
I though you can try below two methods:

  1. just run your command again if you met this error
  2. creat a directory by yourself - ./{example}/analysis/slow5/midbolw5_dir - changing {example} to value of EXA you defined in config file
    BTW, don't feel sorry for 'bothering' :) It is your valuable sugguestions help our project better!

@apbiomol
Copy link
Author

apbiomol commented Apr 24, 2024

Your solution did not work for me, raising the same error message :(
What would be another cause of this issue?

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 24, 2024

So can you add -n in your command, look like snakemake -s dRNAmain.py -n
It will run test module for snakemake to check the code.
Is there any error when you type it? In my PC, all of codes are OK.
By the way, you can have a try on dRNAmain.py in Homepage. (I updated recently) Or delete all of directory in the dRNAmain.py

@apbiomol
Copy link
Author

Thanks for your help.

While running dRNAmain.py in Homepage, I got a new error message.

WildcardError in line 23 of /WorkDB/wonyong/oxford_nanopore_DRS/FastdRNA/dRNAmain_homepage.py:
Wildcards in input files cannot be determined from output files:
'fast5'

My fast5 files for one sample are in a directory. So, I provide the path in my config file like below.
Is this how you do, right?

fast5: "/WorkDB/RNA_seq_oxford_nanopore/20220401_ONT_DRS_rawdata/KO1-G0R1/fast5_pass/"

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 25, 2024

Well... It is strange...
Based on your error, I uploaded a file called debug_dRNAmain.py, both in HomePage and scriptPage.
you can try both of them to see if it work! (those two files are different so you can try all of them to see whether works)
you don't need to change the information in config file

@apbiomol
Copy link
Author

Hello Tomcxf,

I had a progress using your debug script. I got the following messages from the prompt, as you can see below.
What can you tell from this report? And, where can I find outputs? Thanks for your time.

rule slow5_f2s:
input: /WorkDB/RNA_seq_oxford_nanopore/20220401_ONT_DRS_rawdata/KO1-G0R1/fast5_pass/
output: test1/analysis/slow5/midbolw5_dir
jobid: 7
wildcards: example=test1

rule slow5_merge:
input: test1/analysis/slow5/midbolw5_dir
output: test1/analysis/slow5/file.blow5
jobid: 2
wildcards: example=test1

rule slow5_split:
input: test1/analysis/slow5/file.blow5
output: test1/analysis/slow5/bolw5_dir
jobid: 10
wildcards: example=test1

rule slow5_convert:
input: test1/analysis/slow5/bolw5_dir
output: test1/analysis/fast5
jobid: 9
wildcards: example=test1

rule basecall:
input: test1/analysis/fast5, /opt/ont/guppy/data/rna_r9.4.1_70bps_hac.cfg
output: test1/analysis/basecall
jobid: 6
wildcards: example=test1

rule management:
input: test1/analysis/basecall
output: test1/analysis/test1.fastq
jobid: 5
wildcards: example=test1

rule mapping:
input: /WorkDB/wonyong/oxford_nanopore_DRS/nanocompore_test/FungiDB-52_FgraminearumPH-1_AnnotatedTranscripts.fasta, test1/analysis/test1.fastq
output: test1/analysis/mapping/test1_transcript.sam
jobid: 8
wildcards: example=test1

rule sam_sort:
input: test1/analysis/mapping/test1_transcript.sam
output: test1/analysis/mapping/test1_transcript.bam
jobid: 4
wildcards: example=test1

rule nanoplot_visual:
input: test1/analysis/basecall
output: test1/analysis/nanoplot
jobid: 1
wildcards: example=test1

rule transcript_count:
input: test1/analysis/mapping/test1_transcript.bam
output: test1/analysis/count/test1_transcript_counts.csv
jobid: 3
wildcards: example=test1

localrule all:
input: test1/analysis/test1.fastq, test1/analysis/nanoplot, test1/analysis/mapping/test1_transcript.bam, test1/analysis/count/test1_transcript_counts.csv, test1/analysis/slow5/file.blow5
jobid: 0

Job counts:
count jobs
1 all
1 basecall
1 management
1 mapping
1 nanoplot_visual
1 sam_sort
1 slow5_convert
1 slow5_f2s
1 slow5_merge
1 slow5_split
1 transcript_count
11

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 25, 2024

OK, this is the output of your command snakemake -s dRNAXXX.py -n
the -n mean that snakemake didn't run code really, just test if it's correct.
From the output you supply that mean that the .py codes can run well.
So now you can run snakelike -s dRNAXXX.py to actually run your pipeline and generate the output

@apbiomol
Copy link
Author

It started to work! Thanks for your help.
It seems like fast5 files were converted to slow5 files, and then slow5 files were merged.
However, the program failed to find files in the output directory "the midbolw5_dir".

What would be the cause of this error?


[list_all_items] Looking for '*.slow5' files in test1/analysis/slow5/midbolw5_dir
[merge_main] 278 files found - took 0.001s
[merge_main] Allocating new read group numbers - took 0.026s
[slow5_open_with::ERROR] Error opening file 'test1/analysis/slow5/midbolw5_dir/FAS36110_pass_c05666f7_70.blow5': No such file or directory. At src/slow5.c:359
[slow5_open_with::ERROR] Exiting on error. At src/slow5.c:359

@Tomcxf
Copy link
Owner

Tomcxf commented Apr 25, 2024

It seemed that the error happened in slow5_split, since slow5tools looking for blow5 file.
What weird is that it look for 'test1/analysis/slow5/midbolw5_dir/FAS36110_pass_c05666f7_70.blow5' - but I assigned the path "{example}/analysis/slow5/file.blow5" , the software shouldn't look for blow5 in midbolw5_dir. So you can try below:

  1. edit
rule slow5_merge:
   input:
       "{example}/analysis/slow5/midbolw5_dir"
   output:
       "{example}/analysis/slow5/file.blow5"
   shell:
       "slow5tools merge {input} -o {output} -t 8 | rm -rf {input}"

to

rule slow5_merge:
    input:
        "{example}/analysis/slow5/midbolw5_dir"
    output:
        "{example}/analysis/slow5/file.blow5"
    shell:
        "slow5tools merge {input} -o {output} -t 8 "
  1. use script/debug_dRNAmain.py to see whether it works
    BTW, could you just offer me one of your fast5 files? It may help me to solve that better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants