Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for spatial query features #1581

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ullingerc
Copy link
Collaborator

No description provided.

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.07%. Comparing base (f856919) to head (126bef7).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1581      +/-   ##
==========================================
+ Coverage   89.00%   89.07%   +0.07%     
==========================================
  Files         368      371       +3     
  Lines       33888    34437     +549     
  Branches     3828     3899      +71     
==========================================
+ Hits        30161    30675     +514     
- Misses       2473     2484      +11     
- Partials     1254     1278      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hannahbast
Copy link
Member

@ullingerc Thanks a lot, this is great. I have a comment and a question:

  1. I think it's better to move this kind of documentation to https://github.com/ad-freiburg/qlever/wiki and link to it (with a short explanation) from the main README.md of https://github.com/ad-freiburg/qlever. The reason is that documentation is more frequently updated (or simply corrected) than code, and we don't want a commit for every such update. For the same reason, the QLever CLI is in a separate repository https://github.com/ad-freiburg/qlever-control.

  2. You mention that QLever does not yet provide ad-hoc computation of the spatial predicates contains, intersects, and so on. I wonder: since the S2 library is already integrated now, wouldn't it be fairly straightforward to use it for this?

@ullingerc
Copy link
Collaborator Author

@hannahbast Thank you very much for your feedback, A few thoughts on this:

  1. Okay, I see. I totally agree that we should stick to the project's conventions. While I think that, personally, I would prefer a self-contained git repository that can for example be easily moved to a different platform, I agree that we should be consistent: I will change this as you suggested.

  2. Thanks for bringing this up. I also already thought about this. S2 provides some functions, also for example to calculate the centroid ad-hoc. But it would involve quite some infrastructure code (for parsing the different types of WKT strings, converting each of them to the adequate data structure in s2, the wrapping code for each of the functions, etc.). Therefore my suggestion is to postpone this for a few weeks, while I still have lots of other work to do. If I can find time, I will be glad to implement this. I think it would be very useful to have the ad-hoc methods as a sort of fallback for the precomputed ones. For example when importing a small dataset without OpenStreetMap this would be a practical shortcut.

@hannahbast
Copy link
Member

@ullingerc Thanks for the explanation! I have a follow-up question regarding the spatial join to understand this better:

Let's consider any one of the spatial functions, say geof:sfContains and let's consider the special case, where we use it in a FILTER and both arguments are variables and both variables are bound by parts of the remaining query. Then, instead of FILTER geof:sfContains(?x, ?y), we could equivalently write ?x <spatial-join:contains> ?y, where the latter is a magic predicate with the appropriate semantics.

That looks very similar to the magic functions <nearest-neighbors:k>, <nearest-neighbors:k:m>, and <max-distance-in-meters:m, which you have already implemented.

My question is: Would the implementation of such a magic <spatial-join:contains> predicate be analogous, or is there an additional complication?

@ullingerc
Copy link
Collaborator Author

@hannahbast Thanks a lot for your thoughts. There are a few things that come to my mind on this:

  • In principle, indeed the problems are similar, however the devil is in the details. Currently the spatial join implements two algorithms, both of which are suitable for all of the magic predicates. These are (1) a nested-loop baseline algorithm and (2) a fast algorithm based on the S2PointIndex class. The implementation of both algorithms is based on working only with GeoPoints. For contains, intersects etc. a solution only for points is obviously not very useful. Modifying both algorithms and the infrastructure code to work with any kind of geometry and to answer contains, intersects etc. would significantly increase the complexity of the implementation and thus would involve refactoring much of the code. An alternative would be to introduce a second version of the spatial join class for the contains, intersects, ... questions to avoid trouble with the existing functionality. This comes with the risk of duplicate code. Another possibility is a straightforward implementation as a normal SPARQL function geof:something(?a, ?b). This comes with the heavy runtime penalty of having to compare everything pairwise.
  • While an implementation via magic predicates is without a doubt a good step in the right direction, it would be even better to follow the GeoSPARQL standard's syntax. This would be a question concering the parser and/or query planner: how can we detect certain patterns in the query (like FILTER(geof:sfContains(?a, ?b)) or FILTER(geof:distance(?a, ?b) <= SomeConstant)) to invoke a spatial join implicitly - for both complying with the standard's syntax and using an efficient implementation?
  • In any case, even the parsing of more complex WKT literals and the construction as well as querying of appropriate S2 index structures for each of the geometric relations is a larger implementation task. It will also require further familiarization with the S2 library's details.

I suggest that we leave it open for this documentation PR and discuss this in further detail in a meeting, possibly also with @joka921.

@ullingerc
Copy link
Collaborator Author

@sparql-conformance
Copy link

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants