ocrjs

Optical Character Recognition in Java Script based on Tesseract.js

This project is to evaluate how to pre-process images before running them through tesseract.js.

Data preperation.

To keep this repo slim, any larger data is not included in this repo and must be downloaded independently.

Labeled Images for OCR can be found here (creating an account is needed to download):

https://rrc.cvc.uab.es/?ch=15&com=downloads

Download the data from there and put them in a folder structure like this:

serve  # this folder will be made available on localhost:8080
├───data
│   ├───ground_truth
│   │   ├───tr_img_00001.txt
│   │   ├───tr_img_00002.txt
│   │   └─── ...
│   └───images
│       ├───tr_img_00001.jpg
│       ├───tr_img_00002.jpg
│       └─── ...
└───dist  # this is the application, which will be created by "npm run build"

This project uses webpack (for an overview of how to use, see https://webpack.js.org/guides/getting-started/)

Install Node (>=14.x.x).

To install all dependencies run

npm install

Then to build the packaged app via webpack run

npm run build

This creates a dist folder with all files needed for the app. To try the app locally run

npm start

This starts a local server on http://127.0.0.1:8080.

The actual app is served at http://localhost:8080/dist/.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrjs

Data preperation.

About

Releases

Packages

Languages

License

falktan/ocrjs

Folders and files

Latest commit

History

Repository files navigation

ocrjs

Data preperation.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages