Releases · HazyResearch/pdftotree

13 Oct 21:52

github-actions

v0.5.0

20bbd8d

v0.5.0 Latest

Latest

0.5.0 - 2020-10-13

Added

Support for Python 3.8. (#86, @HiromuHota)

Changed

Switch the output format from "HTML-like" to hOCR. (#62, @HiromuHota)
Loosen Keras' version restriction, which is now unnecessarily strict. (#68, @HiromuHota)
Greedily extract contents from PDF even if it looks scanned. (#71, @HiromuHota)
Upgrade Keras to 2.4.0 or later (and TensorFlow 2.2 or later). (#86, @HiromuHota)

Removed

Remove "favor_figures" option and extract everything. (#77, @HiromuHota)
Remove "dry_run" option. (#89, @HiromuHota)

Fixed

Fix a bug that an html file is not created at a given path. (#64, @HiromuHota)
Extract LTChar even if they are not children of LTTextLine. (#79, @HiromuHota)

Assets 2

21 Sep 18:37

lukehsiao

v0.4.1

c9ac213

v0.4.1

This release marks the end of development for the v0.4.x version of pdftotree. Going forward, we plan to change pdftotree to conform to hOCR with v0.5. For this process, we welcome @HiromuHota as a new maintainer.

If you would like to give feedback for this refactor, we invite you to comment in #62.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.5.0 - 2020-10-13

Added

Changed

Removed

Fixed

Releases: HazyResearch/pdftotree

v0.5.0

0.5.0 - 2020-10-13

Added

Changed

Removed

Fixed

v0.4.1