Skip to content

Commit

Permalink
Update README.md for OCR
Browse files Browse the repository at this point in the history
  • Loading branch information
minamotorin authored Sep 28, 2021
1 parent 94bd40a commit 66d7b87
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ The configuration can be done in the settings.json and the description is as fol
"proxy_links":0, // 0 for disabling proxy when fetching page links upon reaching the limit.
"proxy_images":0, // 0 for disabling proxy when fetching page images upon reaching the limit.
"max_retry_links":1, // Max retries for fetching a link using proxies.
"max_retry_images":1 // Max retries for a fetching a image using proxies.
"global_retry_time": // 0 for not running GoBooDo indefinitely, the number of seconds of delay between each global retry otherwise.
"max_retry_images":1, // Max retries for a fetching a image using proxies.
"global_retry_time":30, // 0 for not running GoBooDo indefinitely, the number of seconds of delay between each global retry otherwise.
"lang": "" // "" for create PDF without OCR, languages which OCR reads in. E.g. "eng+ita".
}
~~~

Expand Down Expand Up @@ -63,8 +64,11 @@ fpdf
html5lib
tqdm
pytesseract
pypdf2
~~~

If you want to use OCR with languages other than English, you should download aditional languages data from [tesseract-ocr](https://github.com/tesseract-ocr).

# Features
1. Stateful : GoBooDo keeps a track of the books which are downloaded. In each subsequent iterations of operation only those those links and images are fetched which were not downloaded earlier.
2. Proxy support : Since Google limits the amount of pages accessible to each individual majorly on the basis of IP address, GoBooDo uses proxies for circumventing that limit and maximizing the number of pages that can be accessed in the preview.
Expand Down

0 comments on commit 66d7b87

Please sign in to comment.