-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity reconciliation between schemas and ontologies #72
Comments
This is a general problem where in the past less efficient software for mapping, XML, RDF, etc. have evolved over time to be much better now in 2021, but it depends on the actual use cases...and if you want to even involve the Semantic Web or not, and publish, or republish, as often is the need. For instance, a lot of Schema such as the http://www.wipo.int/standards/XMLSchema/ST96/ are not actually vocabularies in the traditional sense, but real schema for a particular niche set of domains, where no attempt to map to Linked Open Vocabularies or otherwise was part of the effort. (Closed World vs. Open World) I think your immediate mapping needs from Schema <-> Schema or even Any <-> Many might be best accomplished with perhaps a tool and server in the market used quite a bit for that need, Altova MapForce / Server / XMLSpy XMLAs far as a history lesson of how far we have come this page goes over a broad set of tools and software, some no longer used or available: https://www.w3.org/wiki/XML_Schema_software RDFGosh, there's so many over the decades depending on the needs, but practically, mapping existing DB's to RDF was very common in Academia and Enterprise. Here's the dated 2009 state of the art: https://www.w3.org/wiki/Rdb2RdfXG/StateOfTheArt Linked Open DataNowadays, many of the maps are just directly embedded into Wikidata itself through the various SKOS-related properties as I've done with Schema.org and other ontologies I have loosely mapped into it. One example: https://www.wikidata.org/wiki/Q26907166 Yes, manually. But a general Excel or LibreOffice "lookup" function or OpenRefine Semantic WebBrowsing and developing an ontology is totally different than the needs of mapping or linking ontologies. |
@agnescameron Someone just mentioned to me (offlist) that you would likely be better served by asking folks within the W3C DXWG where interoperability and mapping are exactly their focus: https://www.w3.org/2017/dxwg/wiki/Main_Page |
@thadguidry thanks for this! I hadn't encountered DXWG before, but they seem really ideal for this instance. the XML schema software history is also great. |
@agnescameron: I don't think I fully understand your use case yet, but did I mention Cocoda (https://coli-conc.gbv.de/cocoda/) in the meeting? I have not worked with it myself, but it uses the reconciliation API (acting as a client instead of OpenRefine) to create mappings between different ontologies. |
@fsteeg Cocoda seems really well-suited to this use-case; will test it out and report back. thanks! |
Oooo... nice find, Fabian. Didn't know about Cocoda.
Ah, and here's some of the background research that went into and is
layered into the Cocoda tool.
https://coli-conc.gbv.de/hub/
Interesting. Played with it just now... it's nice that you can stay on the
Scope column to keep that context in view and just click back into the
search input box and click on the next suggestion and then see the Scope.
But they should allow a dual view... seeing the list of results in a real
panel (not dropdown interface) AND showing Scope.
Something I'd like us to eventually have in OpenRefine to make recon
easier. Dedicated Recon panel/subpanels.
Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/
…On Tue, Aug 17, 2021 at 9:50 AM agnescameron ***@***.***> wrote:
@fsteeg <https://github.com/fsteeg> Cocoda seems really well-suited to
this use-case; will test it out and report back. thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHQ2RVFQYMWZKAYADXMCEDT5JZMFANCNFSM5AYJLDVQ>
.
|
This came up in this months' call, and I wanted to give a full explanation of the use case I was describing (as it's 'a bit meta'), which can be shaped into more of a feature request / mailing list object through discussion. I originally brought this up in relation to the discussion of type hierarchies in #68 -- my impression is that the main distinction between these cases is that here, the entities being resolved are themselves types, as specified by an ontology.
What we're trying to achieve:
Taking a range of datasets, produced by different people working in a similar context (in this case, innovation data) and reconciling the dataset schemas against a common ontology. This could based on either just the string information of the column headers, but more ideally, a combination of the column header and the data type, or the relationships between different columns within the schema.
The goal of the work we're doing is to build graph of relationships between datasets, allowing merging/querying operations across a range of diverse data sources. There's also a version of this where the entities within the dataset also get reconciled (which looks a lot more like the traditional reconciliation API), but it would be interesting to know what's possible with an ontology alone.
For example: I know 3 different researchers, all of whom use patent identifiers in their datasets in a different format. The WIPO standards ontology specifies patent identifier formatting as part of a hierarchical ontology.
If I wanted to specify what identification scheme was being used in each instance, I could: each PatentPublicationIdentification has a PatentPublicationIdentificationType, which is composed of a sequence of up to 5 different objects including PublicationLanguageCode, PatentDocumentKindCode and PublicationDate. 2 of the 5 are optional, and many of these also have further possible type specifications (e.g. PublicationLanguageCode can have a different ExtendedISOLanguageCodeType depending on when the identifier was specified).
While it's possible to go through this process manually (either by going through the WIPO schema, or using guides to patent ID construction), crosswalking column types like this can be a real pain, especially for newer researchers not versed in the foibles of different notation.
It's possible that the entity reconciliation API is not the place for this problem, but it would be interesting to know what would work well -- so many ontologies get specified but then under-used when it comes to actually linking published schemas to their corresponding types. Are there existing workflows that anyone's familiar wifth for producing this kind of metadata (I'll add them to the census if so)?
The text was updated successfully, but these errors were encountered: