You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some HTML files produce extra entries in the plain_text key of the JSON with the full key, in addition to the entries with the text of each paragraph, i.e., the same paragraph will appear as an entry and as part of this extra entries.
This behavior only manifests using Readibility.js. Using the Python based parser this does not happen.
I am attaching one HTML file that shows this behavior:
tmp2.json does not have the extra entry in the plain_text field
I am wondering if this would disappear by using the latest Readibility.js instead of the embedded version. Any chance that pull request #95 is going to be incorporated soon? It would be great to avoid reporting issues already fixed in the latest Readability.js
Some HTML files produce extra entries in the plain_text key of the JSON with the full key, in addition to the entries with the text of each paragraph, i.e., the same paragraph will appear as an entry and as part of this extra entries.
This behavior only manifests using Readibility.js. Using the Python based parser this does not happen.
I am attaching one HTML file that shows this behavior:
readabilipy -V
0.2.0
readabilipy -i ef94fca40c96ebf85c2217855fe6382364b75da0d8029be5ee395f607886bd9e.html -o tmp.json
The first entry in tmp.json plain_text field has the full text, other entries have the subset per-paragraph text
readabilipy -i ef94fca40c96ebf85c2217855fe6382364b75da0d8029be5ee395f607886bd9e.html -o tmp2.json -p
tmp2.json does not have the extra entry in the plain_text field
I am wondering if this would disappear by using the latest Readibility.js instead of the embedded version. Any chance that pull request #95 is going to be incorporated soon? It would be great to avoid reporting issues already fixed in the latest Readability.js
Thanks!
ef94fca40c96ebf85c2217855fe6382364b75da0d8029be5ee395f607886bd9e.html.gz
The text was updated successfully, but these errors were encountered: