Translations #26

airon90 · 2022-01-20T20:19:40Z

Can you support other language? You can get correct labels from Wikidata

Mte90 · 2022-02-09T11:28:18Z

Yes it will help a lot too as you are doing query I guess that is enough a language picker that change something in the endpoint the game uses.

tuukka · 2022-04-05T23:29:45Z

As all the game data is currently loaded from a single file at start, I think the best approach might be to provide language-specific versions of this file.

Approach 0: Instead of having a language-specific file, fetch the data of the Wikidata item each time a card is shown to see if Wikidata (at the moment) contains the desired translations. I'm not sure which endpoints can be accessed directly by the game in the browser, but e.g. these would seem to work: https://www.wikidata.org/wiki/Special:EntityData/Q42.json and https://query.wikidata.org/bigdata/ldf?subject=wd:Q42

Approach 1: For each card (Wikidata item) in the original data file, replace the original label, description and Wikipedia article title (in English) by ones in the desired language from the same Wikidata item. However, they might not be available or they might be unsuitable (contain the answer or have a mistake).

Approach 2: Generate a new set of cards appropriate in the desired language e.g. by tweaking https://github.com/tom-james-watson/wikitrivia-generator.

EDIT: Approach 3: Generate a new set of cards dynamically from frontend by calling a suitable Sparql endpoint such as QLever. https://qlever.cs.uni-freiburg.de/wikidata/

nicolaes · 2022-05-24T19:10:23Z

I like Approach 2 the most. Approaches 0 and 1 are for me:

Pro: long-term and low-maintenance
Con: may hinder quick-fix tweaks in the database

I'll try Approach 2 in Romanian to see how it goes.

Edit: I take back liking Approach 2 after seeing the 73GB data source. I will still give it a try, but don't have high hopes.

tuukka · 2022-05-24T20:52:09Z

@nicolaes 👍 Perhaps we can find the necessary people who can make this happen together. To make approach 2 easier, I found some initial discussion on reimplementing it based on queries against a Sparql endpoint. In my experience, the official Sparql endpoint does not have the performance needed, but QLever (and/or Virtuoso) might be able to answer all the queries we need. Here's a quick test that finds about 9000 results that might be suitable for Romanian cards: https://qlever.cs.uni-freiburg.de/wikidata/30kMrq?exec=true

See also: tom-james-watson/wikitrivia-generator#6 and tom-james-watson/wikitrivia-generator#8

nicolaes · 2022-05-25T17:57:24Z

@tuukka Thanks for the idea. I appreciate the effort to put together the Romanian version.
The quick test of 9000 entries is very relevant; current English database has 10k entries.

I don't know SPARQL, so I am playing around the link you provided.
My plan is to find a reasonably fast query that provides at least 5000 results, then put it together with the wikitrivia app.

nicolaes · 2022-05-26T09:21:29Z

I gave QLever a few tries, then I dropped it.
I ran a query with all year types (created, discovered, invented, born etc) and I lost the backend connectivity. Probably because lack of optimization. Here is the code: https://qlever.cs.uni-freiburg.de/wikidata/aFFkcp

I got progres on the raw data source processing, and now have ~1000 usable entries for Romanian.
I'm not yet sure if Approaches 0 and 1 are viable, but it might be worth trying them out.
My steps to get the Romanian entities were:

downloading the wiki data (73GB)
parsing it with wikibase-dump-filter - 150k entries in 9h (should be faster for more popular languages)
adapt the wikitrivia-generator parser (translate filter words, change en to ro, adjust viewcounts) - 250 entries / hour

Since I don't have many cards, I will account for the scenario when you don't have any relevant cards to show.
Then I will put this live - see if Romanians actually use it.

tuukka · 2022-05-26T15:11:02Z

@nicolaes I hadn't thought of the possibility to create a set of cards dynamically based on a Sparql query. I've added it as "Approach 3" in my original list. At a glance, an advantage would be that the data would update automatically, but a disadvantage would be that two games couldn't be guaranteed to be played with the same set of cards.

I have reported the QLever crash to its developers - I hope it's something they can easily fix as QLever is very performant in general.

Do you know why you got just 10% of the amount of cards compared to English? For example, is it because the Romanian labels are missing, the filter words match more often, or the viewcounts are lower?

tuukka · 2022-06-04T12:23:44Z

Update: here's a query for QLever that returns all suitable Wikidata items and their required attributes sorted by sitelinks count (pageviews is not available for queries). You can change "en" to any other language code: https://qlever.cs.uni-freiburg.de/wikidata/OycBUK

tom-james-watson · 2022-06-04T14:32:49Z

Some really interesting discussion here!

@nicolaes - yeah unfortunately the wikitrivia-generator process as it stands is slow. I think sparql is definitely the future. Also, with something like the example @tuukka has worked on, that shows how easy the SPARQL approach would make it to internationalize.

The discussion of how to work out the details of the SPARQL approach should be kept to tom-james-watson/wikitrivia-generator#6.

nicolaes · 2022-06-04T20:04:02Z

@tuukka sorry for late reply, messed up notifications.
I appreciated the time you invested in the SPARQL query. I got to download the 10k sample you prepared without any QLever issues.

About Romanian low count of entities: it's because not all pages are translated and I didn't adjust the view count thresholds correctly (e.g. I reduced it by 40x compared to English, while there are 60x less Romanian speakers).

PS: top hit from SPARQL query in Romanian is the wiki of Russia 🤔

tuukka mentioned this issue May 26, 2022

Some queries DoS the backend, frontend hangs ad-freiburg/qlever#673

Closed

tuukka mentioned this issue May 29, 2022

Augment Wikidata graph with ranking information better than the sitelinks count ad-freiburg/qlever#676

Open

tuukka mentioned this issue Jun 4, 2022

Investigate using SPARQL to source cards instead of scraping dumps tom-james-watson/wikitrivia-generator#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translations #26

Translations #26

airon90 commented Jan 20, 2022

Mte90 commented Feb 9, 2022

tuukka commented Apr 5, 2022 •

edited

Loading

nicolaes commented May 24, 2022 •

edited

Loading

tuukka commented May 24, 2022

nicolaes commented May 25, 2022 •

edited

Loading

nicolaes commented May 26, 2022

tuukka commented May 26, 2022

tuukka commented Jun 4, 2022

tom-james-watson commented Jun 4, 2022 •

edited

Loading

nicolaes commented Jun 4, 2022 •

edited

Loading

Translations #26

Translations #26

Comments

airon90 commented Jan 20, 2022

Mte90 commented Feb 9, 2022

tuukka commented Apr 5, 2022 • edited Loading

nicolaes commented May 24, 2022 • edited Loading

tuukka commented May 24, 2022

nicolaes commented May 25, 2022 • edited Loading

nicolaes commented May 26, 2022

tuukka commented May 26, 2022

tuukka commented Jun 4, 2022

tom-james-watson commented Jun 4, 2022 • edited Loading

nicolaes commented Jun 4, 2022 • edited Loading

tuukka commented Apr 5, 2022 •

edited

Loading

nicolaes commented May 24, 2022 •

edited

Loading

nicolaes commented May 25, 2022 •

edited

Loading

tom-james-watson commented Jun 4, 2022 •

edited

Loading

nicolaes commented Jun 4, 2022 •

edited

Loading