-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Error during extracting pages in some pdfs #720
Comments
Can replicate:
|
Probably the solution is to call |
I mean, to fix this issue we have to make a change to pdfminer.six, using |
Okey, I understand now :) Do you know approximate time of release with this fix? |
Nobody is working on it as far as I know. Do you have time to work on this? |
Unfortunately I don't :/ Have to work on different projects, but if something change I will update and could look at this bug. |
resolve1 when getting the default width.
* Issue #720 resolve1 when getting the default width. * Add CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>
Below pdf implies this bug |
self.attrs['MediaBox'] contains params with type PDFObjRef insted of int. I used resolve1 on all params in self.attrs['MediaBox'] to eliminate problem
@pietermarsman @psrubing This was an issue for me in a past project and they ended up using an OCR solution. I was going to say I could take a look at this to debug it but noticed @gosiafilipek created a PR solution? If those are done I have time in the next 2 months to contribute, but didn't see a 'good first issue' icon or whatever its called so I looked back and found these I could start with. Does anyone have requests or recommendations on where I should start? |
Hi @datatalking, Thanks for reaching out! And for wanting to help! You can get in touch on gitter.im. In the private or group chat. We can have a sync about what to work on. I'll try and see if I can create a good-first-issue label. |
Fixed by #772 |
Hello,
I've encountered bug during extrating pages using extract_pages() function from pdfminer.high_level module. This only happens to some pdf-s.
Image below provides this bug:
Below pdf implies this bug:
pdf_bug.pdf
Environment:
Python - 3.7.11
pdfminer.six - 20201018
The text was updated successfully, but these errors were encountered: