Augment Wikidata graph with ranking information better than the sitelinks count #676

tuukka · 2022-05-29T13:33:47Z

Would it be easy to add some ranking information (triples) to the Wikidata endpoint? This has been discussed for years elsewhere (T143424 T174981), but I'm not aware of a query endpoint that would provide this yet. Here's two open-sourced rankings that I could find:

QRank (pageviews): https://qrank.wmcloud.org/
Danker (PageRank): https://danker.s3.amazonaws.com/index.html

hannahbast · 2022-05-29T13:59:18Z

Adding triples for ranking would be rather easy, but I have a question:

We always use ^schema:about/wikibase:sitelinks for ranking. This counts the number of Wikimedia pages of an entity and is a very good proxy for popularity (and a much better proxy than, for example, the number of triples an entity is involved in). For example, here is a list of all people in Wikidata ranked by the number of sitelinks: https://qlever.cs.uni-freiburg.de/wikidata/kfJfrG

Have you tried ^schema:about/wikibase:sitelinks or is there anything that you don't like about it?

tuukka · 2022-05-29T14:18:05Z

I am using the sitelinks count but I see it as just one metric:

sitelinks measures how "global" the notability and interest towards a topic is among Wikimedia contributors
pageviews measures how much readers a topic has among the general public
PageRank measures the "centrality" and connectedness of the topic in the Wikimedia graph

My current use case is reimplementing wikitrivia-generator, which is currently heavy and slow:

First it needs a full Wikidata dump (more or less solved with a large Wikidata query in QLever).
Then it makes pageviews API calls one-by-one, which takes days.

See more on the pain here: tom-james-watson/wikitrivia#26 (comment)

hannahbast · 2022-05-29T14:31:05Z

@tuukka Do you have a demo of what the wikitrivia-generator does? Without fully understanding yet, what you want, a viable approach might be:

Get the appropriate subset from Wikidata via a CONSTRUCT query
Build a QLever instance for that subset
Ask queries to that instance

Don't be afraid of building and running a qlever instance, it's as simple as this in a directory with a TTL file (which could be obtained via a CONSTRUCT query), using the qlever script:

. qlever      # Configure
qlever index  # Build index
qlever start  # Start the server

tuukka · 2022-05-29T14:59:10Z

Here's the original game: https://wikitrivia.tomjwatson.com/

Here's the game data file as produced by wikitrivia-generator (in English, with items that were once generated and never updated, as it's too much hassle): https://wikitrivia-data.tomjwatson.com/items.json

So far, some people seem to have been able to fork the script and run it in their own language with more or less success: Basque, Romanian.

Ideally, it would be possible for the player to pick any language supported by Wikidata, and the game could make a suitable Sparql query to get a fresh set of up-to-date items for that language and no other backend infrastructure was needed.

You are right, it is also possible to implement this query without using the official QLever instance for now, and this issue could be tagged wishlist :-)

hannahbast · 2022-05-29T19:41:00Z

Thanks for the explanation, now I understand. For this kind of application, asking a Wikidata SPARQL endpoint from time to time seems to be the method of choice.

But isn't then a query like https://qlever.cs.uni-freiburg.de/wikidata/m76Lrg doing exactly what you need? It works for any language and takes 20 - 30 seconds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augment Wikidata graph with ranking information better than the sitelinks count #676

Augment Wikidata graph with ranking information better than the sitelinks count #676

tuukka commented May 29, 2022 •

edited

Loading

hannahbast commented May 29, 2022 •

edited

Loading

tuukka commented May 29, 2022

hannahbast commented May 29, 2022

tuukka commented May 29, 2022

hannahbast commented May 29, 2022

Augment Wikidata graph with ranking information better than the sitelinks count #676

Augment Wikidata graph with ranking information better than the sitelinks count #676

Comments

tuukka commented May 29, 2022 • edited Loading

hannahbast commented May 29, 2022 • edited Loading

tuukka commented May 29, 2022

hannahbast commented May 29, 2022

tuukka commented May 29, 2022

hannahbast commented May 29, 2022

tuukka commented May 29, 2022 •

edited

Loading

hannahbast commented May 29, 2022 •

edited

Loading