-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on non-ASCII input. #1032
Comments
What version of pdfminer.six are you using? I can't reproduce this with either Python 3.11 or 3.12 and pdfminer.six v20240706. |
Looks old.
Unsure why it would be old, I used |
Closing since @dhdaines can't reproduce. Probalby you can fix this by removing all versions of pdfminer and pdfminer.six and then installing the lastest version from pip. |
I can confirm the error is gone with a new download in a new python For reference, here is the output of
However, I get zero output rather than the desired output, which is not as expected/desired. Perhaps you could tell me if you can get any text output from the file specified? I also tried various command line options like |
Description
Crash on non-ASCII input:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: ordinal not in range(128)
Steps to reproduce the bug
To make it easier, this will download mc3362.pdf.
wget https://github.com/user-attachments/files/16489263/mc3362.pdf && pdf2txt.py mc3362.pdf
Error produced
The text was updated successfully, but these errors were encountered: