Fix time out of querries #83

Nudin · 2024-08-27T01:20:41Z

The background component of machtsinn runs several SPARQL-queries, sadly several of them nowadays time out. Therefore, many languages don't receive any updates anymore. (See the bottom of the statistics-page to see when which query was successful last.

Possible ways to fix/improve this:

Optimize the queries – maybe ask some experts?
Split up the queries into even more – (I tried splitting up de into a querry only covering the nouns and one covering the rest. But most are nouns, so this querry too fails most of the time. Splitting Noun by Genus might work for de.)
Is there some way to get a higher timeout? There once were plans for a higher timeout for tools like this – but I'm not up-to-date.
Add a limit to the queries. Would mean that the would at least get some new matches in, but not all. We need to check the logic if this doesn't break anything (for example pruning).

The text was updated successfully, but these errors were encountered:

Jerkiller · 2024-09-01T04:49:15Z

I agree that this is the most impactful issue to date.

Let me add some thoughts:

Default query timing out: I think that asking for "all the other languages" is now practically impossible.
- The lexicographical namespace has moved forward in these years and the number of lexemes grow up a lot (not only for German). New languages has grown a lot.
- We could divide it in many single language queries. At least for the ones with most senseless lexemes (ru, et, ml, es, la, el, an, eu, id, ja, fa, uk, sk, cs, nn). What do you think?
Single-language queries (de) timing out. That's a big issue... I agree with the strategies you proposed, even if maybe not optimal.
- I tried with not much success optimizing the query... Probably I will do some other tries before asking for help!
- Partitioning: Good idea. I also tried to partition by the initial lemma letter which may be fine for German: it forms a fixed number of partitions which are small and balanced in proportions. But in general, it cannot be applied for all languages.
- Limiting: not the best solution because some lexemes are excluded. But in every case, when the users match the lexeme with the sense, the excluded lexemes will come out in the query results because the senseless lexemes will be fewer and fewer.
- Increasing timeouts: I heard about orbopengraph and I used QLever, but in this case, queries need some tweaking.

In my script I used a different approach, which is surely slower, but maybe could be of interest.

A process 1 searches all the the senseless lexemes in a language and writes them in a file/table.
A process 2 reads the senseless lexeme file row by row, it searches the senseless lexeme in label (and aliases too) and writes the possible matches in another data structure.
Process 3 is a dialog with the user that is being asked if a match is valid or not.

Nudin · 2024-09-05T21:57:09Z

The new query-main.wikidata.org seems to be faster than the so-far default. I switched to it. Then I found that using the default.sparql querry didn't work due to encoding issues of the database. I fixed those. And now the default-querry runs again. 🥳

Only the da, de and sv querries fail. We can partition them or strip down the filter, or replace them by the apparently more efficient querries used for en/fr/etc. We should look into where those differ.

Nudin pinned this issue Aug 27, 2024

Nudin mentioned this issue Aug 27, 2024

New (co-)maintainer needed #81

Open

Jerkiller mentioned this issue Sep 1, 2024

Add Italian lexemes query #82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix time out of querries #83

Fix time out of querries #83

Nudin commented Aug 27, 2024

Jerkiller commented Sep 1, 2024

Nudin commented Sep 5, 2024

Fix time out of querries #83

Fix time out of querries #83

Comments

Nudin commented Aug 27, 2024

Jerkiller commented Sep 1, 2024

Nudin commented Sep 5, 2024