-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't map from NCBI Uniprot proteins #17
Comments
As we didn't expose the identifiers.org mappings for NCBI yet, I think it shouldn't create issues to disable the NCBI source. In general, it seems that NCBI allows several different identifiers (not just from uniprot), e.g. http://www.ncbi.nlm.nih.gov/protein/CAA71118.1 from genbank. This might make the inclusion of this datasource into the IMS more difficult. |
Namespace overlapping, but without CURIEs.. not so nice, NCBI. So we are not guaranteed that such other ncbi protein URI do not match the uniprot regular expression, and so could be wrongly mapped back again to uniprot by IMS if used as an input.
I am checking to ensure this pattern is not used by any of the RDF in the cache - but it could still be a useful URI pattern to support as NCBI services are widely used in bioinformatics. |
I think the problem here is that NCBI doesn't want to link to UniProt or other external resources directly but to there own instance of a protein which they just happen to have taken from UniProt. I think that we should probably not go along and make sure we end up at UniProt, unless somebody explicitly searches for an NCBI link . But that means we should not depend on NCBI to resolve that issue. (Maybe this is what @stain meant? |
Looking up IMS mapping for URIs like
http://www.ncbi.nlm.nih.gov/protein/P62158
do not return URIs like
http://purl.uniprot.org/uniprot/P62158
or its transitive
http://bio2rdf.org/drugbank:BE0000418
even though the lookups of the uniprot or drugbank identifier in reverse works fine and
include the ncbi protein pattern as part of the uniprot mapping
This is caused by both sources for Uniprot and NCBI Protein claim to handle the
http://www.ncbi.nlm.nih.gov/protein/$id
pattern.We do not currently have any NCBI Protein mappings, so perhaps a workaround is to disable the NCBI Protein source?
The text was updated successfully, but these errors were encountered: