Labs time with Tesseract and node.
Install Tesseract-OCR
brew install tesseract --all-languages
npm install
You can delete everything under /data_extracted/
and /documents/
. They're there just to exemplify the experience.
Add your new documents/receipts at /documents/
and run:
node extract.js
You should be able to see your extracted data under /data_extracted/
.
Missing: tests. You can use this code as you like. No guarantees.