Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Wikidata reconciliation by distance #3663

Closed
VojtechDostal opened this issue Feb 22, 2021 · 9 comments
Closed

Support Wikidata reconciliation by distance #3663

VojtechDostal opened this issue Feb 22, 2021 · 9 comments
Labels
reconciliation Related to the reconciliation operations and other features Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@VojtechDostal
Copy link

Reconciliation by string matching is useful in many cases, but it is currently (to my knowledge) impossible to find closest items to the matched object.

Proposed solution

Use case: I have a list of buildings with coordinates (lat,lon). I'd like to find what the closest item(s) to those coordinates are. Additionally I'd like to be able to filter out results by class (subclass of: building) and suggest only these. High-confidence matches (very close and corresponding names) could be auto-matched.

Alternatives considered

I don't know of any alternative way/hack to load the closest item to given coordinates. However, the Wikidata SPARQL service has a distance service and I think there is also a special API call for exactly this.

@VojtechDostal VojtechDostal added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Feb 22, 2021
@tfmorris
Copy link
Member

That's a good suggestion, but it needs to be implemented in a reconciliation service, not OpenRefine. It could be an enhancement to https://github.com/wetneb/openrefine-wikibase or it could be a specialized reconciliation service.

OpenRefine would send columns like Name, SubclassOf, Latitude, and Longitude (perhaps Altitude?) and the service could use those in its scoring algorithm. The one thing that we don't have a good way to specify currently is configuration parameters which are session-wide e.g. direct subclass vs arbitrarily nested subclass or maximum radius threshold/cutoff. Other than that, this is totally doable with the current infrastructure.

@VojtechDostal
Copy link
Author

Sorry, I did not realize that the reconciliation service is developed separately. I guess we can close it here and I'll post the issue there?

@tfmorris
Copy link
Member

@wetneb May be able to transfer the issue (not sure if that works between organizations). Why don't we wait for him before you go to the effort of recreating it.

@gitonthescene
Copy link
Contributor

gitonthescene commented Feb 28, 2021

@VojtechDostal - I’m assuming you have a list of buildings and you want to find out which are closet to which. One workaround is to round (to whatever precision) the latitude/longitude coordinates and do a self cell.cross() to find which are in the same quadrant and then calculate the distance between matching rows. This has worked well for me in the past. Could do the same cell.cross() against a separate project with a list of candidates to match against. Note that points near each other but in opposite sides of the edge of a quadrant will not match so it’s best to do this a second time with the center of the quadrants skewed.

@thadguidry
Copy link
Member

@VojtechDostal I've had good luck with Bing Maps API which is pretty generous with a free developer key (< 125,000 transactions) and just constructing the URL that I need in a new OpenRefine column and using Fetch URLs.
https://docs.microsoft.com/en-us/bingmaps/rest-services/locations/find-a-location-by-point

But if you already do work on Google Cloud, you might already have many credits you can carry over to use on the Google Maps platform as credit: https://developers.google.com/maps/documentation/geocoding/usage-and-billing

@tfmorris
Copy link
Member

We're getting a little off track here. The ask was to look things up in a reconciliation service (e.g. Wikidata), not a mapping service or another OpenRefine project.

@VojtechDostal It doesn't look like it's possible to transfer this issue to a repo in another org/user. Sorry for getting your hopes up! I'm going to close this one and you can recreate it in https://github.com/wetneb/openrefine-wikibase (presuming that it's Wikidata or another Wikibase-based database that you're interested in reconciling against.

@gitonthescene
Copy link
Contributor

As you said, this is more of a question of implementing such a feature in a reconciliation service. In practical terms you’re going to want to limit the items you’re doing the distance calculation for. My suggestion above was focused on how to do such limiting. It could be used to build such a reconciliation service or as presented as a workaround given that no such service has been identified.

@wetneb
Copy link
Member

wetneb commented Mar 3, 2021

I have opened an issue on the reconciliation service: wetneb/openrefine-wikibase#101

@gitonthescene
Copy link
Contributor

Linking related issue: #1966

@antoine2711 antoine2711 added the reconciliation Related to the reconciliation operations and other features label Apr 12, 2022
@tfmorris tfmorris removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reconciliation Related to the reconciliation operations and other features Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

No branches or pull requests

6 participants