`ql:contains-word` now can show the score of the word match in the respective text #1397

Flixtastic · 2024-07-12T01:22:56Z

The fulltext index of QLever has forever been able to associate the occurence of a word in a text with a score.
This PR adds the functionality to actually retrieve this score and to use it in the remainder of the query.
Currently the score is bound to a variable the name of which is automatically determined from the involved literals and variables. The easiest way to get the names of these variables is to use SELECT * or to look at the runtime information tree.

Flixtastic · 2024-07-12T14:54:15Z

modified: src/index/IndexImpl.Text.cpp
Formatted with clang

modified: test/QueryPlannerTestHelpers.h
Updated expected result width in TextIndexScanForWord

modified: test/engine/TextIndexScanForWordTest.cpp
Added include to print variables
Updated result widths expected in tests
Updated variables expected in ColumnMap

Problems:
When using clang-16 with the .clang-format file provided in the qlever folder on the .cpp files, the respective .h includes get moved down to all other includes instead of being on top.

joka921

This already looks very nice, I only have a few comments, but the most things we have already discussed today.

src/engine/TextIndexScanForWord.cpp

src/parser/sparqlParser/SparqlQleverVisitor.cpp

test/engine/TextIndexScanForWordTest.cpp

sonarqubecloud · 2024-07-12T17:06:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

…adapted unit tests. Missing e2e tests.

Commit doesn't contain all changes necessary for pull request yet.

…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.

…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.

Flixtastic · 2024-12-04T14:10:15Z

Everything should be fixed now. Tests are complete and scores are shown. Also new test methods to build a text index are implemented.

joka921

Thank you very much.
As the last round here was quite some time ago, I did another full review, but in general this code looks nice and clean, I only have several small nitpicks, most of which should be easy to fix.

src/engine/TextIndexScanForWord.cpp

src/index/IndexImpl.h

src/parser/data/Variable.cpp

src/parser/data/Variable.h

src/parser/data/Variable.cpp

test/engine/TextIndexScanForWordTest.cpp

test/util/IndexTestHelpers.cpp

test/util/IndexTestHelpers.h

codecov · 2024-12-05T09:38:33Z

Codecov Report

Attention: Patch coverage is 95.87345% with 30 lines in your changes missing coverage. Please review.

Project coverage is 89.80%. Comparing base (0400f90) to head (deb1e37).
Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/util/StringUtilsImpl.h	90.00%	0 Missing and 4 partials ⚠️
src/engine/idTable/IdTable.h	87.50%	1 Missing and 2 partials ⚠️
src/engine/sparqlExpressions/SparqlExpression.cpp	25.00%	3 Missing ⚠️
src/util/JoinAlgorithms/JoinAlgorithms.h	90.00%	0 Missing and 3 partials ⚠️
src/engine/Operation.cpp	93.54%	1 Missing and 1 partial ⚠️
src/engine/QueryPlanner.cpp	95.55%	1 Missing and 1 partial ⚠️
src/engine/idTable/CompressedExternalIdTable.h	85.71%	0 Missing and 2 partials ⚠️
src/util/ChunkedForLoop.h	80.00%	0 Missing and 2 partials ⚠️
src/util/ConcurrentCache.h	84.61%	0 Missing and 2 partials ⚠️
src/engine/CartesianProductJoin.cpp	95.00%	0 Missing and 1 partial ⚠️
... and 6 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1397      +/-   ##
==========================================
+ Coverage   89.67%   89.80%   +0.12%     
==========================================
  Files         383      385       +2     
  Lines       36942    36995      +53     
  Branches     4174     4181       +7     
==========================================
+ Hits        33129    33223      +94     
+ Misses       2512     2477      -35     
+ Partials     1301     1295       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…1665) Each operation now has a `bool` that determines whether the results can be stored in the cache or not (whether it is actually stored depends on other circumstances, like the available cache size). That `bool` does not have to be fixed when the operation is created, but can be changed. For example, this is useful for index scans that only return a subset of their full result (because of another constraining operation, like a join or a filter).

This is a first step towards making QLever compile with C++17. If the compile-time flag `QLEVER_CPP_17` is set, use Eric Niebler's `range-v3` library as a drop-in replacement for `std::ranges`. In the code, we simply write `ql::ranges` instead of `std::ranges` in most places. Some places need special treatment. For example, where `std::ranges` was used as a C++20 concept, we now use the macros `CPP_template` and `CPP_and` (also from the `range-v3` library), which does the right thing for both C++20 and C++17.

… saving nofNonLiterals in the configuration json file.

…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file." This reverts commit 1adcecb.

…adapted unit tests. Missing e2e tests.

…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.

…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.

…o the wordsFileContent and docsFileContent strings. Now you can clearly see what lines are added and can writing tests is cleaner

…in the wordsFileContent and docsFileContent as pair contentsOfWordsFileAndDocsFile

Signed-off-by: Johannes Kalmbach <[email protected]>

… saving nofNonLiterals in the configuration json file.

…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file." This reverts commit 1adcecb.

… saving nofNonLiterals in the configuration json file.

src/index/IndexImpl.h

…tIndex everywhere aswell as num-non-literals to nom-non-literals-text-index

joka921

One last run of the checks and then we can finally merge this.

sparql-conformance · 2024-12-13T15:35:34Z

Conformance check passed ✅

No test result changes.

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=deb1e375c1ed984202490a4073e0d10c1dbdfb9d&prev=4237e0d4af70e6e400f4357f61756eb5873fe98a

sonarqubecloud · 2024-12-13T15:56:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Flixtastic added 2 commits July 12, 2024 03:12

ql:contains-word now can show the respective word-score.

ea9d39c

Fixed tests and formatted files.

30736ef

joka921 requested changes Jul 12, 2024

View reviewed changes

src/engine/TextIndexScanForWord.cpp Outdated Show resolved Hide resolved

src/parser/sparqlParser/SparqlQleverVisitor.cpp Outdated Show resolved Hide resolved

test/engine/TextIndexScanForWordTest.cpp Show resolved Hide resolved

joka921 mentioned this pull request Jul 16, 2024

How ?ql_score_ is computed in the latest text index implementation? #1393

Open

Flixtastic and others added 19 commits July 27, 2024 16:00

New formatting for Word Score Variables. Changed where necessary and …

e752db8

…adapted unit tests. Missing e2e tests.

Merge branch 'ad-freiburg:master' into master

4ef4d93

Merge branch 'ad-freiburg:master' into master

d52063f

Merge branch 'master' of github.com:Flixtastic/qlever.

c6fe0c6

Commit doesn't contain all changes necessary for pull request yet.

Added getWordSCoreVariable for std::string_view

d0b9ee8

Merge branch 'ad-freiburg:master' into master

2eade97

Merge branch 'ad-freiburg:master' into master

595cb57

Merge branch 'ad-freiburg:master' into master

b4c8c3b

Merge branch 'ad-freiburg:master' into master

72e5d64

Merge branch 'ad-freiburg:master' into master

d8f9df4

Made it possible to construct query execution contexts with text inde…

29511c6

…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.

Merge branch 'ad-freiburg:master' into master

3855978

Reduced usage of column copying in TextIndexScanForWord.cpp

6021401

Merge branch 'ad-freiburg:master' into master

d9701ae

Merge branch 'ad-freiburg:master' into master

5f0ce01

Merge branch 'ad-freiburg:master' into master

e2c47cf

Merge branch 'ad-freiburg:master' into master

e6a0cf7

Changed the counting of nofNonLiterals to nofLiterals. Some methods a…

ed9fbda

…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.

Merge branch 'ad-freiburg:master' into master

5ad3d8f

Flixtastic requested a review from joka921 December 4, 2024 20:45

joka921 requested changes Dec 5, 2024

View reviewed changes

Merge branch 'ad-freiburg:master' into master

af6bd64

joka921 and others added 22 commits December 12, 2024 18:54

Reverting the nofLiterals being saved in the TextMetaData and instead…

1adcecb

… saving nofNonLiterals in the configuration json file.

Revert to first sync and then reapply "Reverting the nofLiterals bein…

f5eefab

…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file." This reverts commit 1adcecb.

ql:contains-word now can show the respective word-score.

583a67a

Fixed tests and formatted files.

e4cb2ed

New formatting for Word Score Variables. Changed where necessary and …

3ce304d

…adapted unit tests. Missing e2e tests.

Added getWordSCoreVariable for std::string_view

eb8e83a

Made it possible to construct query execution contexts with text inde…

cd4789a

…x. This is done through passing the words and docsfile as string, and then building the text index as normal. Basic Test is existent (TODO make more edge case tests) and e2e testing is fixed.

Changed the counting of nofNonLiterals to nofLiterals. Some methods a…

fdba417

…re still unstable because of the way nofContexts are counted. Implemented new more refined tests.

renamed nofLiterals to nofLiteralsInTextIndex

6686325

Removed redundant method getWordScoreVariable

0faf3d0

added method appendEscapedWord to escape special chars in Variables

eafd594

Added two function in the TextIndexScanTestHelpers.h to add content t…

fd01a97

…o the wordsFileContent and docsFileContent strings. Now you can clearly see what lines are added and can writing tests is cleaner

Added tests for Scores. Also commented tests and refined them

65842f4

Changed the getQec function and the respective makeTestIndex to take …

baa10cf

…in the wordsFileContent and docsFileContent as pair contentsOfWordsFileAndDocsFile

Fix the multiple definition error.

6bb80d3

Signed-off-by: Johannes Kalmbach <[email protected]>

Reverting the nofLiterals being saved in the TextMetaData and instead…

d093d85

… saving nofNonLiterals in the configuration json file.

Revert to first sync and then reapply "Reverting the nofLiterals bein…

716e828

…g saved in the TextMetaData and instead saving nofNonLiterals in the configuration json file." This reverts commit 1adcecb.

Merge remote-tracking branch 'origin/master'

613a2c4

Reverting the nofLiterals being saved in the TextMetaData and instead…

2e32bd3

… saving nofNonLiterals in the configuration json file.

Changed some naming to better describe functions

e93f944

joka921 requested changes Dec 13, 2024

View reviewed changes

src/index/IndexImpl.h Outdated Show resolved Hide resolved

Changed the ambiguous naming of nofNonLiterals to nofNonLiteralsInTex…

deb1e37

…tIndex everywhere aswell as num-non-literals to nom-non-literals-text-index

Flixtastic requested a review from joka921 December 13, 2024 08:58

joka921 approved these changes Dec 13, 2024

View reviewed changes

joka921 changed the title ~~ql:contains-word now can show the respective word score~~ ql:contains-word now can show the score of the word match in the respective text Dec 16, 2024

joka921 merged commit a97905e into ad-freiburg:master Dec 16, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ql:contains-word` now can show the score of the word match in the respective text #1397

`ql:contains-word` now can show the score of the word match in the respective text #1397

Flixtastic commented Jul 12, 2024 •

edited by joka921

Loading

Flixtastic commented Jul 12, 2024

joka921 left a comment

sonarqubecloud bot commented Jul 12, 2024

Flixtastic commented Dec 4, 2024

joka921 left a comment

codecov bot commented Dec 5, 2024 •

edited

Loading

joka921 left a comment

sparql-conformance bot commented Dec 13, 2024

sonarqubecloud bot commented Dec 13, 2024

ql:contains-word now can show the score of the word match in the respective text #1397

ql:contains-word now can show the score of the word match in the respective text #1397

Conversation

Flixtastic commented Jul 12, 2024 • edited by joka921 Loading

Flixtastic commented Jul 12, 2024

joka921 left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Jul 12, 2024

Quality Gate passed

Flixtastic commented Dec 4, 2024

joka921 left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 5, 2024 • edited Loading

Codecov Report

joka921 left a comment

Choose a reason for hiding this comment

sparql-conformance bot commented Dec 13, 2024

Conformance check passed ✅

sonarqubecloud bot commented Dec 13, 2024

Quality Gate passed

`ql:contains-word` now can show the score of the word match in the respective text #1397

`ql:contains-word` now can show the score of the word match in the respective text #1397

Flixtastic commented Jul 12, 2024 •

edited by joka921

Loading

codecov bot commented Dec 5, 2024 •

edited

Loading