Skip to content

Releases: scribeocr/scribe.js

v0.7.0

07 Jan 08:38
Compare
Choose a tag to compare

What's Changed

  • Major rework of PDF export implementation.
    • Writing to PDF is faster and uses less memory.
      • Documents that used to crash due to memory errors now run almost instantly.
    • For many inputs, output PDF file sizes are now much smaller.
  • Fixed memory leaks within OCR module.
  • Misc bug fixes.

Full Changelog: v0.6.1...v0.7.0

v0.6.1

17 Dec 05:25
Compare
Choose a tag to compare

What's Changed

  • Fixed Node.js support on Windows (#9)
  • Fixed platform-related installation issues (#27, #29)
  • Increased use of workers in Node.js version, enabling much better performance using a single process.

Full Changelog: v0.5.1...v0.6.1

v0.5.1

10 Dec 09:30
Compare
Choose a tag to compare

What's Changed

  • Fixed bug causing crashes when recognizing certain PDFs using Node.js (#26)
  • Minor updates

Full Changelog: v0.5.0...v0.5.1

v0.5.0

25 Nov 09:08
Compare
Choose a tag to compare

What's Changed

  • Added config argument to recognize, which allows for passing arguments to Tesseract.js (#22)
  • Added support for parsing PDF text at various orientations (90/180/270 degrees).
  • Minor improvements to OCR quality.
  • Various improvements to imports of HOCR and native PDF text.
  • Added saveAs utility function for saving files.
  • Added opt.kerning option that can be used to enable or disable kerening.

Full Changelog: v0.4.1...v0.5.0

v0.4.1

10 Nov 19:24
Compare
Choose a tag to compare

What's Changed

  • Implemented parallel processing by default for Node.js version
    • To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
  • Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
  • Added Nimbus Mono font (similar to Courier)
  • Improvements to text extraction from PDF files.
  • Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.

v0.3.1

31 Oct 08:38
Compare
Choose a tag to compare

What's Changed

  • Fixed memory leaks

Full Changelog: v0.3.0...v0.3.1

v0.3.0

31 Oct 03:59
Compare
Choose a tag to compare

What's Changed

  • Improvements to parsing existing text from PDF files
  • Various improvements to OCR text and bounding box quality
  • Fixed memory leak
  • Various minor changes

Full Changelog: v0.2.8...v0.3.0

v0.2.8

30 Sep 07:30
Compare
Choose a tag to compare
  • Improved performance of "Quality" recognition mode.
    • Many documents should run up to 10-15% faster in quality mode.
  • Updated Scribe Tesseract build to improve recognition accuracy.
    • Accuracy for data tables and other complex layouts has been noticeably improved.
  • Improved image pre-processing.
  • Updated Vanilla Tesseract build to support debugging features and image upscaling.
  • Other minor changes

Full Changelog: v0.2.7...v0.2.8

v0.2.7

25 Sep 05:21
Compare
Choose a tag to compare
  • Fixed bug preventing existing text in some PDFs from being detected (025456a)
  • Increased resolution at which PDFs are rendered (0dd8801)
  • Added calcSuppFontInfo option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)
    • This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
  • Various other minor updates

Full Changelog: v0.2.6...v0.2.7

v0.2.6

06 Sep 08:00
Compare
Choose a tag to compare