-
Notifications
You must be signed in to change notification settings - Fork 24
Reconcilation fails when specifying type work (Q386724) - sometimes giving 502 response #131
Comments
This could be because the type hierarchy is too big for us to fetch all subclasses of work (Q386724). |
Hmm, there are only 266? See https://w.wiki/4eyu |
Those are only the direct subclasses, not the indirect ones. To get the indirect ones you need to replace |
Ok, yeah, that seems to correspond to the behaviour I'm getting: All subchildren of intellectual work returns 41917 results
While all subchildren of work times out
This has been working fine for at least the past 10 months though - could it be some extra types were just added so the query times out now? Is there a way this query could be formulated the other way around, as mentioned in the query optimisation page so it doesn't timeout? Something like getting all items that match the title then traversing from their type forwards to see whether their type is a subclass of work? |
If there is, I have not found it yet! I think this is one of the weakest points of this service and I personally do not see a way out of this issue without changing its architecture quite dramatically. |
Correct me if I'm wrong, I just had a quick look. But at the moment it looks like you're caching type-subclass lists, then checking those for each item. I guess in general this speeds things up a lot, unless the type has too many subclasses for the query to work? Would it be possible in some cases (maybe if searching for all subclasses of a type results in a timeout) to directly query whether the item is an instance or subclass of a particular type? For example from the query optimisation page, this search times out:
But adding the hint to reverse the traverse order returns a result in 200 ms.
But I'm new to this so please excuse any naivety! |
It looks like subclass type checking of work is a known issue |
Absolutely, it can make sense to query for membership on a per instance basis, or at least on a per direct type basis (because we already fetch the P31 values outside SPARQL). This does mean making one SPARQL request per reconciliation query, which is likely to slow down query resolution quite a lot in general (and potentially be a problem for the SPARQL endpoint itself?). So I would be cautious about doing that for any type, but it could be a sensible fallback for types where the initial subclass fetching fails. In general, as the type hierarchy grows and gets messier, there is no chance we can do this on the fly I think. There could be:
|
Yeah, that makes sense! As an immediate remedy in the direction of point 1, would it be possible to catch this error in particular and provide a more descriptive error like "query timed out checking types - please choose a more restrictive type"? As something of a middle ground between the current approach and type checking each item, according to this issue checking whether a particular type is a subclass of another is quite fast. Would it make sense to build the cache this way ("is type A a subclass of type B") instead of "what are all the subclasses of type B"? |
Yes, ideally this would be used as a fallback for the cases where there are too many subclasses. |
Hi. I'm coming from an issue I found here (diegodlh/zotero-cita#149) using the Cita extension in Zotero to connect the citations of scholarly works with Wikidata.
There is more detail in the above issue, but I'll reproduce the important part here:
If I try to find this item, everything works fine if I specify the actual type (scholarly article):
curl -X POST -F 'queries={"q0":{"query":"Parton distributions of the proton","type":"Q13442814"}}' https://wikidata.reconci.link/en/api
Working up the type hierarchy, it also works if I specify article (Q191067), written work (Q47461344), creative work (Q17537576) or intellectual work (Q15621286), but doesn't work if I specify the type work (Q386724).
In this case I get the following error message.
I'm not sure if this is the right place to be reporting this, or if the issue might lie further up the chain. But I was hoping you might have some insight / be able to help investigate further.
Thanks!
The text was updated successfully, but these errors were encountered: