You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the same issue for some PDFs (which unfortunately contain sensitive information).
The version of pdfminer.six I'm using is 20221105.
My bug report is like below, from issue #617
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/high_level.py", line 200, in extract_pages
interpreter.process_page(page)
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 991, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1010, in render_contents
self.execute(list_value(streams))
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1036, in execute
func(*args)
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 896, in do_TJ
self.device.render_string(
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 133, in render_string
textstate.linematrix = self.render_string_horizontal(
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 170, in render_string_horizontal
for cid in font.decode(obj):
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdffont.py", line 1174, in decode
return self.cmap.decode(bytes)
File "~/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/cmapdb.py", line 134, in decode
n = len(code) // 2
TypeError: object of type 'PSKeyword' has no len()
_Originally posted by @cchristiansen in https://github.com/pdfminer/pdfminer.six/issues/617#issuecomment-1135787678_
A quick hack which fixes the issue for me, is to insert the following two lines into `cmapdb.py` at line 134:
if isinstance(code, PSKeyword):
code = code.name
Could your please fix this? Thanks in advance
The text was updated successfully, but these errors were encountered:
This is a consequence of #1042 - we declare (in the type annotations which are not checked at runtime) that code is a bytes, but in reality, some code further up the chain has decided that because in a compliant PDF, the argument to a text showing operator (except TJ) can only be a bytes , then, well, it is obviously a bytes, and doesn't need to be checked at runtime.
This is obviously not true :)
dhdaines
added a commit
to dhdaines/pdfminer.six
that referenced
this issue
Nov 27, 2024
The version of pdfminer.six I'm using is
20221105
.My bug report is like below, from issue #617
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/high_level.py", line 200, in extract_pages/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 991, in process_pageinterpreter.process_page(page)
File "
self.render_contents(page.resources, page.contents, ctm=ctm)
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1010, in render_contents/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1036, in executeself.execute(list_value(streams))
File "
func(*args)
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 896, in do_TJ/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 133, in render_stringself.device.render_string(
File "
textstate.linematrix = self.render_string_horizontal(
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 170, in render_string_horizontal/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdffont.py", line 1174, in decodefor cid in font.decode(obj):
File "
return self.cmap.decode(bytes)
File "~/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/cmapdb.py", line 134, in decode
n = len(code) // 2
TypeError: object of type 'PSKeyword' has no len()
The text was updated successfully, but these errors were encountered: