Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't map from NCBI Uniprot proteins #17

Open
stain opened this issue Oct 12, 2015 · 3 comments
Open

Can't map from NCBI Uniprot proteins #17

stain opened this issue Oct 12, 2015 · 3 comments

Comments

@stain
Copy link
Contributor

stain commented Oct 12, 2015

Looking up IMS mapping for URIs like
http://www.ncbi.nlm.nih.gov/protein/P62158
do not return URIs like
http://purl.uniprot.org/uniprot/P62158
or its transitive
http://bio2rdf.org/drugbank:BE0000418
even though the lookups of the uniprot or drugbank identifier in reverse works fine and
include the ncbi protein pattern as part of the uniprot mapping

This is caused by both sources for Uniprot and NCBI Protein claim to handle the http://www.ncbi.nlm.nih.gov/protein/$id pattern.

We do not currently have any NCBI Protein mappings, so perhaps a workaround is to disable the NCBI Protein source?

@danidi
Copy link
Contributor

danidi commented Oct 12, 2015

As we didn't expose the identifiers.org mappings for NCBI yet, I think it shouldn't create issues to disable the NCBI source. In general, it seems that NCBI allows several different identifiers (not just from uniprot), e.g. http://www.ncbi.nlm.nih.gov/protein/CAA71118.1 from genbank. This might make the inclusion of this datasource into the IMS more difficult.

@stain
Copy link
Contributor Author

stain commented Oct 14, 2015

Namespace overlapping, but without CURIEs.. not so nice, NCBI.

So we are not guaranteed that such other ncbi protein URI do not match the uniprot regular expression, and so could be wrongly mapped back again to uniprot by IMS if used as an input.

http://www.ncbi.nlm.nih.gov/protein/$id is not used in any of the linksets, so it might also be OK to disable it from the Uniprot datasource, and thus remove it from the results when looking up a uniprot identifier. Would this be a meaningful resolution, or do we want to keep http://www.ncbi.nlm.nih.gov/protein/$id as an outgoing link from /mapURI?

I am checking to ensure this pattern is not used by any of the RDF in the cache - but it could still be a useful URI pattern to support as NCBI services are widely used in bioinformatics.

@Chris-Evelo
Copy link

I think the problem here is that NCBI doesn't want to link to UniProt or other external resources directly but to there own instance of a protein which they just happen to have taken from UniProt. I think that we should probably not go along and make sure we end up at UniProt, unless somebody explicitly searches for an NCBI link . But that means we should not depend on NCBI to resolve that issue. (Maybe this is what @stain meant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants