Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add option to disable image pre-processing entirely when using the OCR feature #9202

Open
timminator opened this issue Jan 11, 2025 · 2 comments

Comments

@timminator
Copy link

I noticed when testing the tesseract engine directly on the original image without image pre-processing that I got often better results than when using the pre-processed image in SubtitleEdit.
So I would appreciate it, if there could be a checkbox in the image pre-processing tab to just use the original image for OCR without any image pre-processing applied.

@niksedk
Copy link
Member

niksedk commented Jan 11, 2025

Please give examples and much more details! GitHub support attaching .zip files.
Also, what OCR engine are you using?

@timminator
Copy link
Author

Im using Tesseract 5.5 currently but I noticed the same using version 5.3.
Here is a zip-file with an example:
OCR test.zip
In there are two pictures and three screenshots with the results, one time using Tesseract from the command line and the other two using tesseract through Subtitle Edit. On the full picture tesseract using the preprocessed image falls completely apart whereas tesseract OCRs the orignal image from the command line pretty well.
I am aware that I maybe could get a better result with a different binary image threshold or inverting the colors, but on these two pictures I can not find values that get even remotely close to the accuracy from just using the original image. And you dont even need to fine tune the binary image threshold.
Therefore I would really appreciate this added functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants