Releases: scribeocr/scribe.js
Releases · scribeocr/scribe.js
v0.7.0
What's Changed
- Major rework of PDF export implementation.
- Writing to PDF is faster and uses less memory.
- Documents that used to crash due to memory errors now run almost instantly.
- For many inputs, output PDF file sizes are now much smaller.
- Writing to PDF is faster and uses less memory.
- Fixed memory leaks within OCR module.
- Misc bug fixes.
Full Changelog: v0.6.1...v0.7.0
v0.6.1
v0.5.1
v0.5.0
What's Changed
- Added
config
argument torecognize
, which allows for passing arguments to Tesseract.js (#22) - Added support for parsing PDF text at various orientations (90/180/270 degrees).
- Minor improvements to OCR quality.
- Various improvements to imports of HOCR and native PDF text.
- Added
saveAs
utility function for saving files. - Added
opt.kerning
option that can be used to enable or disable kerening.
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
- Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set
scribe.opt.workerN = 1
before calling any functions.
- To restore the previous behavior (1 worker), set
- Non-default behavior for extracting text from PDF files is now handled by setting the properties of
scribe.opt.usePDFText
. - Added Nimbus Mono font (similar to Courier)
- Improvements to text extraction from PDF files.
- Improvements to text positioning.
Full Changelog: v0.3.1...v0.4.1
Note: This post combines changes for 0.4.0
and 0.4.1
since the former was only the most recent version for a few hours.
v0.3.1
v0.3.0
What's Changed
- Improvements to parsing existing text from PDF files
- Various improvements to OCR text and bounding box quality
- Fixed memory leak
- Various minor changes
Full Changelog: v0.2.8...v0.3.0
v0.2.8
- Improved performance of "Quality" recognition mode.
- Many documents should run up to 10-15% faster in quality mode.
- Updated Scribe Tesseract build to improve recognition accuracy.
- Accuracy for data tables and other complex layouts has been noticeably improved.
- See benchmark repo for examples and accuracy metrics.
- Accuracy for data tables and other complex layouts has been noticeably improved.
- Improved image pre-processing.
- Updated Vanilla Tesseract build to support debugging features and image upscaling.
- Other minor changes
Full Changelog: v0.2.7...v0.2.8
v0.2.7
- Fixed bug preventing existing text in some PDFs from being detected (025456a)
- Increased resolution at which PDFs are rendered (0dd8801)
- Added
calcSuppFontInfo
option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)- This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
- Various other minor updates
Full Changelog: v0.2.6...v0.2.7
v0.2.6
- Restored compatibility with Webpack
Full Changelog: v0.2.5...v0.2.6