Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

scribeocr / scribe.js Public

Notifications You must be signed in to change notification settings
Fork 3
Star 42

Code
Issues 19
Pull requests
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: scribeocr/scribe.js

Releases · scribeocr/scribe.js

v0.7.0

07 Jan 08:38

Balearica

Compare

Choose a tag to compare

Loading

v0.7.0 Latest

Latest

What's Changed

Major rework of PDF export implementation.
- Writing to PDF is faster and uses less memory.
  - Documents that used to crash due to memory errors now run almost instantly.
- For many inputs, output PDF file sizes are now much smaller.
Fixed memory leaks within OCR module.
Misc bug fixes.

Full Changelog: v0.6.1...v0.7.0

Assets 2

Loading

All reactions

v0.6.1

17 Dec 05:25

Balearica

Compare

Choose a tag to compare

Loading

v0.6.1

What's Changed

Fixed Node.js support on Windows (#9)
Fixed platform-related installation issues (#27, #29)
Increased use of workers in Node.js version, enabling much better performance using a single process.

Full Changelog: v0.5.1...v0.6.1

Assets 2

Loading

All reactions

v0.5.1

10 Dec 09:30

Balearica

Compare

Choose a tag to compare

Loading

v0.5.1

What's Changed

Fixed bug causing crashes when recognizing certain PDFs using Node.js (#26)
Minor updates

Full Changelog: v0.5.0...v0.5.1

Assets 2

Loading

All reactions

v0.5.0

25 Nov 09:08

Balearica

Compare

Choose a tag to compare

Loading

v0.5.0

What's Changed

Added config argument to recognize, which allows for passing arguments to Tesseract.js (#22)
Added support for parsing PDF text at various orientations (90/180/270 degrees).
Minor improvements to OCR quality.
Various improvements to imports of HOCR and native PDF text.
Added saveAs utility function for saving files.
Added opt.kerning option that can be used to enable or disable kerening.

Full Changelog: v0.4.1...v0.5.0

Assets 2

Loading

All reactions

v0.4.1

10 Nov 19:24

Balearica

Compare

Choose a tag to compare

Loading

v0.4.1

What's Changed

Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
Added Nimbus Mono font (similar to Courier)
Improvements to text extraction from PDF files.
Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.

Assets 2

Loading

All reactions

v0.3.1

31 Oct 08:38

Balearica

Compare

Choose a tag to compare

Loading

v0.3.1

What's Changed

Fixed memory leaks

Full Changelog: v0.3.0...v0.3.1

Assets 2

Loading

All reactions

v0.3.0

31 Oct 03:59

Balearica

Compare

Choose a tag to compare

Loading

v0.3.0

What's Changed

Improvements to parsing existing text from PDF files
Various improvements to OCR text and bounding box quality
Fixed memory leak
Various minor changes

Full Changelog: v0.2.8...v0.3.0

Assets 2

Loading

All reactions

v0.2.8

30 Sep 07:30

Balearica

Compare

Choose a tag to compare

Loading

v0.2.8

Improved performance of "Quality" recognition mode.
- Many documents should run up to 10-15% faster in quality mode.
Updated Scribe Tesseract build to improve recognition accuracy.
- Accuracy for data tables and other complex layouts has been noticeably improved.
  - See benchmark repo for examples and accuracy metrics.
Improved image pre-processing.
Updated Vanilla Tesseract build to support debugging features and image upscaling.
Other minor changes

Full Changelog: v0.2.7...v0.2.8

Assets 2

Loading

All reactions

v0.2.7

25 Sep 05:21

Balearica

Compare

Choose a tag to compare

Loading

v0.2.7

Fixed bug preventing existing text in some PDFs from being detected (025456a)
Increased resolution at which PDFs are rendered (0dd8801)
Added calcSuppFontInfo option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)
- This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
Various other minor updates

Full Changelog: v0.2.6...v0.2.7

Assets 2

Loading

All reactions

v0.2.6

06 Sep 08:00

Balearica

Compare

Choose a tag to compare

Loading

v0.2.6

Restored compatibility with Webpack
Full Changelog: v0.2.5...v0.2.6

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.