-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request - GenBank format #15
Comments
Hi Ben,
What GenBank format do you want? The .tbl file format for GenBank submission, or the .gbk flat file format that you download from GenBank? (or both?)
I should be able to do the latter quite easily now, using https://github.com/BioJulia/GenomicAnnotations.jl as it covers .gbk format at least.
Previously the way it was designed meant GenomicAnnotations.jl couldn’t handle trans-spliced genes such as rps12, but 3 weeks ago it was updated and now it looks like it can.
When I get a moment I’ll give it a try.
Internally Chloe treats rps12 as two distinct genes (rps12A and rps12B), but the intention is to merge them for the output. Unlike most other tools, Chloe can in principle annotate the two half-introns that are trans-spliced.
Cheers
Ian
From: Ben Anderson ***@***.***>
Date: Wednesday, 28 August 2024 at 1:31 PM
To: ian-small/chloe ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [ian-small/chloe] Feature request - GenBank format (Issue #15)
Hi Ian,
Thanks for making this great tool available!
I've been able to get this running in a Singularity container on my machine, but I would like to have output in GenBank format (I can't find an option for that).
Generating a GFF3 file works, but I am having issues converting it to GenBank format (e.g. combining features). I've tried https://github.com/chapmanb/bcbb/blob/master/gff/Scripts/gff/gff_to_genbank.py but the output formatting doesn't parse well (unsure why) and the features are not combined properly (e.g. rps12 is broken into multiple features).
I've tried using https://chloe.plastid.org/annotate.html, but the GenBank output only includes CDS features (I want "gene" features and "intron" features too), and again rps12 is broken up.
Using https://chlorobox.mpimp-golm.mpg.de/geseq.html works well and properly combines features, including Chloe annotations.
I'd like to find a way to run Chloe on many fasta files and get GenBank files without having to manually upload them to GeSeq.
Is this a planned feature? Are there other tools that you would recommend?
Cheers,
Ben
—
Reply to this email directly, view it on GitHub<#15>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD4WKG3Q2BXS4EFK3YFFC6TZTVOD3AVCNFSM6AAAAABNHPP7BCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TCMBZGAYDAOA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks Ian! |
Hi Ben, I've just added code so that Chloe now uses GenomicAnnotations.jl (https://github.com/BioJulia/GenomicAnnotations.jl) for any output other than Chloe's internal .sff format |
Hi Ian, Anyway, I'm trying to test your new version but I was unable to install it. Here's the error I got from Julia:
|
Hi Ben, if you upgrade to the latest version of Julia it will fix that problem.
I’ll see if I can relax the dependency, though.
From: Ben Anderson ***@***.***>
Date: Friday, 22 November 2024 at 11:41 AM
To: ian-small/Chloe.jl ***@***.***>
Cc: Ian Small ***@***.***>, Comment ***@***.***>
Subject: Re: [ian-small/Chloe.jl] Feature request - GenBank format (Issue #15)
Hi Ian,
Thanks for that. One of the reasons I want the GenBank format is because I already have scripts for parsing those files and I want to be able to download data from GenBank and parse it as well as new assemblies. You're probably right that the other format would be easier to parse, and perhaps I should do that at some point, but then what would you recommend for downloaded GenBank formats?
Anyway, I'm trying to test your new version but I was unable to install it. Here's the error I got from Julia:
+ cd chloe
+ julia --project=. -e using Pkg; Pkg.instantiate()
Updating registry at `/tmp/registries/General.toml`
ERROR: Unsatisfiable requirements detected for package UUIDs [cf7118a7]:
UUIDs [cf7118a7] log:
├─possible versions are: 1.10.4 or uninstalled (package in sysimage!)
└─restricted to versions 1.11.0-1 by Chloe [ca11047d] — no versions left
└─Chloe [ca11047d] log:
├─possible versions are: 0.1.15 or uninstalled
└─Chloe [ca11047d] is fixed to version 0.1.15
Stacktrace:
[1] check_constraints(graph::Pkg.Resolve.Graph)
@ Pkg.Resolve /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Resolve/graphtype.jl:998
[2] Pkg.Resolve.Graph(compat::Dict{Base.UUID, Dict{VersionNumber, Dict{Base.UUID, Pkg.Versions.VersionSpec}}}, compat_weak::Dict{Base.UUID, Dict{VersionNumber, Set{Base.UUID}}}, uuid_to_name::Dict{Base.UUID, String}, reqs::Dict{Base.UUID, Pkg.Versions.VersionSpec}, fixed::Dict{Base.UUID, Pkg.Resolve.Fixed}, verbose::Bool, julia_version::VersionNumber)
@ Pkg.Resolve /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Resolve/graphtype.jl:345
[3] deps_graph(env::Pkg.Types.EnvCache, registries::Vector{Pkg.Registry.RegistryInstance}, uuid_to_name::Dict{Base.UUID, String}, reqs::Dict{Base.UUID, Pkg.Versions.VersionSpec}, fixed::Dict{Base.UUID, Pkg.Resolve.Fixed}, julia_version::VersionNumber, installed_only::Bool)
@ Pkg.Operations /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:587
[4] resolve_versions!(env::Pkg.Types.EnvCache, registries::Vector{Pkg.Registry.RegistryInstance}, pkgs::Vector{Pkg.Types.PackageSpec}, julia_version::VersionNumber, installed_only::Bool)
@ Pkg.Operations /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:407
[5] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, level::Pkg.Types.UpgradeLevel; skip_writing_project::Bool, preserve::Nothing)
@ Pkg.Operations /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1538
[6] up
@ /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1520 [inlined]
[7] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; level::Pkg.Types.UpgradeLevel, mode::Pkg.Types.PackageMode, preserve::Nothing, update_registry::Bool, skip_writing_project::Bool, ***@***.***{})
@ Pkg.API /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:351
[8] up
@ /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:326 [inlined]
[9] up
@ /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:164 [inlined]
[10] instantiate(ctx::Pkg.Types.Context; manifest::Nothing, update_registry::Bool, verbose::Bool, platform::Base.BinaryPlatforms.Platform, allow_build::Bool, allow_autoprecomp::Bool, ***@***.***{})
@ Pkg.API /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:1797
[11] instantiate
@ /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:1773 [inlined]
[12] instantiate(; ***@***.***{})
@ Pkg.API /julia-1.10.4/share/julia/stdlib/v1.10/Pkg/src/API.jl:1772
[13] top-level scope
@ none:1
FATAL: While performing build: while running engine: exit status 1
—
Reply to this email directly, view it on GitHub<#15 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD4WKG7RWIFT4DDO6AAE4QD2B2RWBAVCNFSM6AAAAABNHPP7BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJSHAYTIOBRGE>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Thanks Ian. I tried updating Julia to v. 1.11.1 and it finished the install. I'm now hitting a different error when attempting to annotate. Here's my command (to a Singularity container with the install):
But it doesn't load the references
I can include more of the error message if helpful. |
Hi Ben,
You’ll need to add Chloe again to the new version of Julia, it keeps the packages separate between versions. You may need to instantiate again too.
I’ve taken the compatibility requirement for 1.11 out, and now it runs fine under 1.10.4 on my machine, so if you still have 1.10 around and can’t get 1.11 to work you can try that.
Good luck
Ian
|
Hi Ian, |
Update: adding the argument |
Update 2: my parser (https://github.com/bmichanderson/scripts/blob/master/genbank_parse.py) doesn't like the GenBank formatted files.
|
You’d need to take that up with the author(s) of GenomicAnnotations.jl…
(or I can on your behalf, if you can tell me how the output deviates from the spec)
|
Thanks Ian. In the mal-formed sequence ValueError, it appears that the "Scanner.py" of BioPython (https://github.com/biopython/biopython/blob/master/Bio/GenBank/Scanner.py) raises an error when it detects improper indentation of the sequence. Adding a single space to each line of the sequence entry at the bottom of the file removes that error. The header line is improper (missing at least one field). It needs: I think it couldn't parse features with incorrectly formatted "join" and "complement" (?) Actually, the errors were raised when they nested another "join" inside a "join"
(these are related to rps12) |
Note @kdyrhage's comment here as well: The join(complement(...),complement(...)) is not equivalent to complement(join(... , ....)) |
Yes, indeed, Chloe was joining the complemented fragments in the wrong order, my bad |
It still seems to be having a problem. Here is the re-built (after your recent push) Chloe's rps12 CDS feature: Here it is when downloaded from Chloe via GeSeq |
Try now; should be correct, if not as neat as the GeSeq version |
Looks good! rps12 in a different sample: re-built Chloe: Chloe via GeSeq: It still complains a bit about the header (one too many fields now -- removed circular and put more spaces between PLN and the date and it doesn't complain anymore). Not a big deal to me, since I can adjust it pretty simply I think. Only the sequence indentation needs fixing now (assuming for the GenomicsAnnotations people). |
Ah, another minor issue. The CDS features are missing a "/gene=..." field, which affects my parsing script. |
OK, added a 'gene' attribute to CDS/intron/tRNA/rRNA entries. |
Thanks Ian, looks good! |
Hi Ian,
Thanks for making this great tool available!
I've been able to get this running in a Singularity container on my machine, but I would like to have output in GenBank format (I can't find an option for that).
Generating a GFF3 file works, but I am having issues converting it to GenBank format (e.g. combining features). I've tried https://github.com/chapmanb/bcbb/blob/master/gff/Scripts/gff/gff_to_genbank.py but the output formatting doesn't parse well (unsure why) and the features are not combined properly (e.g. rps12 is broken into multiple features).
I've tried using https://chloe.plastid.org/annotate.html, but the GenBank output only includes CDS features (I want "gene" features and "intron" features too), and again rps12 is broken up.
Using https://chlorobox.mpimp-golm.mpg.de/geseq.html works well and properly combines features, including Chloe annotations.
I'd like to find a way to run Chloe on many fasta files and get GenBank files without having to manually upload them to GeSeq.
Is this a planned feature? Are there other tools that you would recommend?
Cheers,
Ben
The text was updated successfully, but these errors were encountered: