add huge_tree=True to the XMLParser used for responses. #55
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Without
huge_tree=True
, lxml parsing apparently fails on certain, even slightly largish responses (apparently of more than 9.5MB).Because also
recover=True
, from the viewpoint of Sickle, this happens silently. I only noticed it happening because it results in losing also the resumption token and therefore ending the crawl, upon which I started to wonder why I had way less records than I should have had.Alternatively, if one wanted to get fancy, one might want to add the XMLParser to use as an optional parameter passed to Sickle and from then on down to the OAIResponse. This would allow people to customize for themselves what kind of XML parsing behaviour they want. For this PR however, I opted for the most simple fix.