Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ql:contains-word now can show the score of the word match in the respective text #1397

Merged
merged 56 commits into from
Dec 16, 2024

Conversation

Flixtastic
Copy link
Contributor

@Flixtastic Flixtastic commented Jul 12, 2024

The fulltext index of QLever has forever been able to associate the occurence of a word in a text with a score.
This PR adds the functionality to actually retrieve this score and to use it in the remainder of the query.
Currently the score is bound to a variable the name of which is automatically determined from the involved literals and variables. The easiest way to get the names of these variables is to use SELECT * or to look at the runtime information tree.

@Flixtastic
Copy link
Contributor Author

modified: src/index/IndexImpl.Text.cpp
Formatted with clang

modified: test/QueryPlannerTestHelpers.h
Updated expected result width in TextIndexScanForWord

modified: test/engine/TextIndexScanForWordTest.cpp
Added include to print variables
Updated result widths expected in tests
Updated variables expected in ColumnMap

Problems:
When using clang-16 with the .clang-format file provided in the qlever folder on the .cpp files, the respective .h includes get moved down to all other includes instead of being on top.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks very nice, I only have a few comments, but the most things we have already discussed today.

src/engine/TextIndexScanForWord.cpp Outdated Show resolved Hide resolved
src/parser/sparqlParser/SparqlQleverVisitor.cpp Outdated Show resolved Hide resolved
test/engine/TextIndexScanForWordTest.cpp Show resolved Hide resolved
Copy link

Flixtastic and others added 19 commits July 27, 2024 16:00
Commit doesn't contain all changes necessary for pull request yet.
…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.
…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.
@Flixtastic
Copy link
Contributor Author

Everything should be fixed now. Tests are complete and scores are shown. Also new test methods to build a text index are implemented.

@Flixtastic Flixtastic requested a review from joka921 December 4, 2024 20:45
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.
As the last round here was quite some time ago, I did another full review, but in general this code looks nice and clean, I only have several small nitpicks, most of which should be easy to fix.

src/engine/TextIndexScanForWord.cpp Outdated Show resolved Hide resolved
src/index/IndexImpl.h Outdated Show resolved Hide resolved
src/parser/data/Variable.cpp Outdated Show resolved Hide resolved
src/parser/data/Variable.h Outdated Show resolved Hide resolved
src/parser/data/Variable.cpp Outdated Show resolved Hide resolved
test/engine/TextIndexScanForWordTest.cpp Show resolved Hide resolved
test/engine/TextIndexScanForWordTest.cpp Show resolved Hide resolved
test/util/IndexTestHelpers.cpp Outdated Show resolved Hide resolved
test/util/IndexTestHelpers.cpp Outdated Show resolved Hide resolved
test/util/IndexTestHelpers.h Outdated Show resolved Hide resolved
Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 95.87345% with 30 lines in your changes missing coverage. Please review.

Project coverage is 89.80%. Comparing base (0400f90) to head (deb1e37).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/util/StringUtilsImpl.h 90.00% 0 Missing and 4 partials ⚠️
src/engine/idTable/IdTable.h 87.50% 1 Missing and 2 partials ⚠️
src/engine/sparqlExpressions/SparqlExpression.cpp 25.00% 3 Missing ⚠️
src/util/JoinAlgorithms/JoinAlgorithms.h 90.00% 0 Missing and 3 partials ⚠️
src/engine/Operation.cpp 93.54% 1 Missing and 1 partial ⚠️
src/engine/QueryPlanner.cpp 95.55% 1 Missing and 1 partial ⚠️
src/engine/idTable/CompressedExternalIdTable.h 85.71% 0 Missing and 2 partials ⚠️
src/util/ChunkedForLoop.h 80.00% 0 Missing and 2 partials ⚠️
src/util/ConcurrentCache.h 84.61% 0 Missing and 2 partials ⚠️
src/engine/CartesianProductJoin.cpp 95.00% 0 Missing and 1 partial ⚠️
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1397      +/-   ##
==========================================
+ Coverage   89.67%   89.80%   +0.12%     
==========================================
  Files         383      385       +2     
  Lines       36942    36995      +53     
  Branches     4174     4181       +7     
==========================================
+ Hits        33129    33223      +94     
+ Misses       2512     2477      -35     
+ Partials     1301     1295       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joka921 and others added 22 commits December 12, 2024 18:54
…1665)

Each operation now has a `bool` that determines whether the results can be stored in the cache or not (whether it is actually stored depends on other circumstances, like the available cache size). That `bool` does not have to be fixed when the operation is created, but can be changed.

For example, this is useful for index scans that only return a subset of their full result (because of another constraining operation, like a join or a filter).
This is a first step towards making QLever compile with C++17.

If the compile-time flag `QLEVER_CPP_17` is set, use Eric Niebler's `range-v3` library as a drop-in replacement for `std::ranges`. In the code, we simply write `ql::ranges` instead of `std::ranges` in most places. Some places need special treatment. For example, where `std::ranges` was used as a C++20 concept, we now use the macros `CPP_template` and `CPP_and` (also from the `range-v3` library), which does the right thing for both C++20 and C++17.
… saving nofNonLiterals in the configuration json file.
…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file."

This reverts commit 1adcecb.
…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.
…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.
…o the wordsFileContent and docsFileContent strings. Now you can clearly see what lines are added and can writing tests is cleaner
…in the wordsFileContent and docsFileContent as pair contentsOfWordsFileAndDocsFile
Signed-off-by: Johannes Kalmbach <[email protected]>
… saving nofNonLiterals in the configuration json file.
…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file."

This reverts commit 1adcecb.
… saving nofNonLiterals in the configuration json file.
src/index/IndexImpl.h Outdated Show resolved Hide resolved
…tIndex everywhere aswell as num-non-literals to nom-non-literals-text-index
@Flixtastic Flixtastic requested a review from joka921 December 13, 2024 08:58
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last run of the checks and then we can finally merge this.

@sparql-conformance
Copy link

@joka921 joka921 changed the title ql:contains-word now can show the respective word score ql:contains-word now can show the score of the word match in the respective text Dec 16, 2024
@joka921 joka921 merged commit a97905e into ad-freiburg:master Dec 16, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants