Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclasses / metadata for fasta sequence type (protein / nucleotide) #19192

Open
bernt-matthias opened this issue Nov 23, 2024 · 0 comments
Open

Comments

@bernt-matthias
Copy link
Contributor

It could be handy to give the tools the possibility to find out id a fasta dataset is nucleotide / aminoacid.

  • Add metadata
  • Implement subtypes

The problem is not trivial/impossible if one tries to cover all corner cases, but I guess it should not be to hard to come up with a solution that works in most cases. For instance, check if a large majority of the characters are of the corresponding alphabet, e.g. 90%+x?

One could also just store the alphabet as metadata and leave the logic to the tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant