-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: use updated codon usage tables #62
Comments
Good points thanks for the report. When you use a TaxID, Chisel will use the codon-usage-tables library to get that dictionnary, so I guess the fix should happen at the level of this other library. From what I remember I chose Kazusa because it was easy to query and extensive. Do you know if there is any way to quickly get a codon table from CoCoPUTs without downloading the 19Gb files they have on their website? In case this helps unblock you, Chisel's codon optimizers also accept arbitrary codon table as a dictionnary (see here), so you could download CoCoputs, extract the table you want, and feed it to the constraint. |
Thanks for the quick reply! I have only used the CoCoPUTs database briefly (actually, last time I used it, I downloaded a codon usage table to use as a "custom dictionary" with DNA chisel, as you suggested). It really looks like there is no direct way of downloading the Codon Usage Tables (and much less a way to automatically query their database). I guess that would require talking directly to the person that manages it... An alternative would be to generate .csv from the huge 19 GB file but that would require storing them somewhere and also updating them frequently. I do not see a straightforward implementation of this, but keep it mind for future updates, as I think having updated codon usage tables would really enhance the use of DNA chisel! Best, |
Here is a direct way to extract codon usage data from a specific taxid from the CoCoPUTs database: Changing the taxid parameter in the below url, will return codon usage data in a simple easily parsable text format, along with statistics on how mange individual codon and CDS was used to calculate the table.
|
Thank you @gjerman , that is useful. Is it possible to retrieve the % instead of counts? As a related note, @Zulko do the frequencies need to add up exactly to 1? For example the stop codon in the link adds up to |
This is not a documented api endpoint, but the endpoint they use in their own webapp. There are several url parameters that can be set in the url, but i have not found one the will return the percentage directly. Here is the original url that i simplified by removing parameters that did not obviously change the results.
|
Thanks for the details; if I search for a species with the web interface, there is an option to return a table, and that has percentages as well. But maybe those are calculated from the count numbers. |
@veghp No need for the frequencies to add up exactly to one in DNA Chisel, it only cares about which codon is the most frequent (for best-codon optimization) or the difference between a given codon variant's frequency in the sequence and in the table. Great that there is a way to query CoCoPUTs, would be nice to have a choice between kasuza and cocoputs in the codons_usage_table repo (not sure if they cover the exact same species). |
I am trying to obtain from python the codon tables but I am getting this error:
I think it has something to do with cookies as I was getting the same message if I tried to access that url from a computer for the first time without having visited the CoCoPuTs webpage first. Once I visited I could access any codon table from my browser by changing the Taxonomy ID. Is there a way of doing the same form within python? I have seen that |
I can confirm your issue. This is the best answer I could find: https://stackoverflow.com/questions/19098518/how-to-download-a-file-perl-cgi-backend-using-python-requests?noredirect=1&lq=1 A better solution seems to be to download the whole dataset instead, and subset that for the query parameters. |
This is a issues with the session cookie. Here is an improved example where i fetch a session cookie and subsequently use that when fetching the codon data.
|
That works for me, thanks |
A quick question on codon_usage_table specification within DNAChisel. In the documentation you mention that we should be giving the RSCU table (relative usage of each codon) rather than straight frequencies (that add up to 1 for each aa). How does this affect the AvoidRareCodons specification? Does that directly use the frequencies specified? If yes, then we probably shouldnt be providing the RCSU table but a straight codon frequency table right? Thanks |
The documentation you refer to is: https://edinburgh-genome-foundry.github.io/DnaChisel/ref/builtin_specifications.html?highlight=codon_usage_table#avoidrarecodons DNA Chisel uses this repository for codon tables: https://github.com/Edinburgh-Genome-Foundry/codon-usage-tables An example table: https://github.com/Edinburgh-Genome-Foundry/codon-usage-tables/blob/master/codon_usage_data/tables/e_coli_316407.csv As you can see, frequencies add up to 1, separately for each codon. You are right, we need codon frequencies instead of RSCU. The relative synonymous codon usage (RSCU) is not what its name suggests. Formula and an actual table. Thanks, this will be removed from the documentation in the next release. |
Hello,
I have a request that should enhance the utility of DNA Chisel for codon optimization.
When selecting Codon Usage Tables for species not included in DNA Chisel, Kazusa Codon tables are used. These codon tables are extremely outdated, generated from Genbank data only up to 2007. For some species, there is only one CDS used, and some codons have a 0 usage (see here for examples).
I suggest that more up-to-date codon tables should be included in DNA chisel. For example, CoCoPUTs is an up-to-date database with codon usage tables generated with each new Genbank or Refseq release every three months. I guess it should not be hard to retrieve the usage tables for each species (there is a huge file with the data, but I assume they could be retrieved "on the fly" too).
I am in no way associated with the people that run CoCoPUTs, but I think it is the most up-to-date resource for codon usage tables.
The text was updated successfully, but these errors were encountered: