Releases: HazyResearch/pdftotree
Releases · HazyResearch/pdftotree
v0.5.0
0.5.0 - 2020-10-13
Added
- Support for Python 3.8. (#86, @HiromuHota)
Changed
- Switch the output format from "HTML-like" to hOCR. (#62, @HiromuHota)
- Loosen Keras' version restriction, which is now unnecessarily strict. (#68, @HiromuHota)
- Greedily extract contents from PDF even if it looks scanned. (#71, @HiromuHota)
- Upgrade Keras to 2.4.0 or later (and TensorFlow 2.2 or later). (#86, @HiromuHota)
Removed
- Remove "favor_figures" option and extract everything. (#77, @HiromuHota)
- Remove "dry_run" option. (#89, @HiromuHota)
Fixed
- Fix a bug that an html file is not created at a given path. (#64, @HiromuHota)
- Extract LTChar even if they are not children of LTTextLine. (#79, @HiromuHota)
v0.4.1
This release marks the end of development for the v0.4.x version of pdftotree
. Going forward, we plan to change pdftotree
to conform to hOCR with v0.5. For this process, we welcome @HiromuHota as a new maintainer.
If you would like to give feedback for this refactor, we invite you to comment in #62.