Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object of type 'PSKeyword' has no len() via cmapdb.py #1059

Open
Irina-Pavlova opened this issue Nov 2, 2024 · 1 comment · May be fixed by #1069
Open

Object of type 'PSKeyword' has no len() via cmapdb.py #1059

Irina-Pavlova opened this issue Nov 2, 2024 · 1 comment · May be fixed by #1069

Comments

@Irina-Pavlova
Copy link

          I have the same issue for some PDFs (which unfortunately contain sensitive information). 

The version of pdfminer.six I'm using is 20221105.
My bug report is like below, from issue #617

File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/high_level.py", line 200, in extract_pages
interpreter.process_page(page)
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 991, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1010, in render_contents
self.execute(list_value(streams))
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 1036, in execute
func(*args)
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfinterp.py", line 896, in do_TJ
self.device.render_string(
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 133, in render_string
textstate.linematrix = self.render_string_horizontal(
File "/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdfdevice.py", line 170, in render_string_horizontal
for cid in font.decode(obj):
File "
/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/pdffont.py", line 1174, in decode
return self.cmap.decode(bytes)
File "~/.virtualenvs/foo/lib/python3.10/site-packages/pdfminer/cmapdb.py", line 134, in decode
n = len(code) // 2
TypeError: object of type 'PSKeyword' has no len()

_Originally posted by @cchristiansen in https://github.com/pdfminer/pdfminer.six/issues/617#issuecomment-1135787678_
            

A quick hack which fixes the issue for me, is to insert the following two lines into `cmapdb.py` at line 134:
            if isinstance(code, PSKeyword):
                code = code.name

Could your please fix this? Thanks in advance
@dhdaines
Copy link
Contributor

dhdaines commented Nov 27, 2024

This is a consequence of #1042 - we declare (in the type annotations which are not checked at runtime) that code is a bytes, but in reality, some code further up the chain has decided that because in a compliant PDF, the argument to a text showing operator (except TJ) can only be a bytes , then, well, it is obviously a bytes, and doesn't need to be checked at runtime.

This is obviously not true :)

dhdaines added a commit to dhdaines/pdfminer.six that referenced this issue Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants