Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError when retrieving german nouns #549

Open
2 tasks done
shybyte opened this issue Jan 10, 2025 · 4 comments
Open
2 tasks done

JSONDecodeError when retrieving german nouns #549

shybyte opened this issue Jan 10, 2025 · 4 comments
Assignees
Labels
-priority- High priority bug Something isn't working help wanted Extra attention is needed

Comments

@shybyte
Copy link

shybyte commented Jan 10, 2025

Terms

Behavior

Using Scribe-Data v4.1.0,
when I try to retrieve German nouns, then I get after some time a json.decoder.JSONDecodeError:

scribe-data g --language German --data-type nouns       
Updating data for language(s): German; data type(s): Nouns
Data updated:   0%|                                                          | 0/1 [00:00<?, ?process/s]Querying and formatting German nouns
Data updated:   0%|                                                          | 0/1 [01:02<?, ?process/s]
Traceback (most recent call last):
  File "/home/shybyte/.local/bin/scribe-data", line 8, in <module>
    sys.exit(main())
  File "/home/shybyte/.local/lib/python3.10/site-packages/scribe_data/cli/main.py", line 304, in main
    get_data(
  File "/home/shybyte/.local/lib/python3.10/site-packages/scribe_data/cli/get.py", line 153, in get_data
    query_data(
  File "/home/shybyte/.local/lib/python3.10/site-packages/scribe_data/wikidata/query_data.py", line 236, in query_data
    results = sparql.query().convert()
  File "/home/shybyte/.local/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py", line 1196, in convert
    return self._convertJSON()
  File "/home/shybyte/.local/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py", line 1059, in _convertJSON
    json_str = json.loads(self.response.read().decode("utf-8"))
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 655738 column 9 (char 15065703)

It works fine for other data types like for example verbs.

This might be related to the closed issue #124.

@shybyte shybyte added the bug Something isn't working label Jan 10, 2025
@andrewtavis andrewtavis added the help wanted Extra attention is needed label Jan 10, 2025
@andrewtavis
Copy link
Member

I can confirm that this is on the current version of main as well. Thanks for opening this, @shybyte!

CC @axif0

@andrewtavis andrewtavis added the -priority- High priority label Jan 10, 2025
@andrewtavis
Copy link
Member

Just tried to edit the query and removed plural and gender as responses and it does finish. Splitting it into three queries times out, but two does work 🤔 The thing is that the query_data_type_1.sparql, query_data_type_2.sparql, etc method doesn't appear to work as the original nouns JSON response is overwritten.

@andrewtavis
Copy link
Member

49df8d8 is the current state of it. @axif0, could you take a look at why the nouns.json file is being overwritten if we try to split noun queries in the same way as we do for verbs? From there, I think that something that we should consider is figuring out the exact error that's being thrown and to return an explicit error message that suggests that people download a Wikidata lexeme dump and run the get process in that way :)

That would be the quick response, and from there we can look into #156 that would be likely where we'd need to go to get the queries to always work.

@andrewtavis
Copy link
Member

Feel free to prioritize as you see fit, @axif0 :) No need to jump over to this if you're working on your blogpost still or are working on another issue. We should be able to finalize this in the coming days :)

@andrewtavis andrewtavis moved this from Todo to In Progress in Scribe Board Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-priority- High priority bug Something isn't working help wanted Extra attention is needed
Projects
Status: In Progress
Development

No branches or pull requests

3 participants