Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFNotImplementedError: Unsupported filter: [/'FlateDecode'] #1062

Open
Pique7 opened this issue Nov 19, 2024 · 1 comment · Fixed by dhdaines/playa#22 · May be fixed by #1068
Open

PDFNotImplementedError: Unsupported filter: [/'FlateDecode'] #1062

Pique7 opened this issue Nov 19, 2024 · 1 comment · Fixed by dhdaines/playa#22 · May be fixed by #1068

Comments

@Pique7
Copy link

Pique7 commented Nov 19, 2024

I think the same error/bug/problem like this one: euske/pdfminer#174 .

Solved it by changing pdftypes.py in this way (for pdfminer.six v20240706):

    def decode(self) -> None:
        assert self.data is None and self.rawdata is not None, str(
            (self.data, self.rawdata)
        )
        data = self.rawdata
        if self.decipher:
            # Handle encryption
            assert self.objid is not None
            assert self.genno is not None
            data = self.decipher(self.objid, self.genno, data, self.attrs)
        filters = self.get_filters()
        if not filters:
            self.data = data
            self.rawdata = None
            return
        for (f, params) in filters:

# ----difference with original `decode` method starts here--------
            if isinstance(f, list):
                try:
                    f = resolve1(f[0])
                except AttributeError:
                    f = f
# ----and ends here-----------------------------------------------

            if f in LITERALS_FLATE_DECODE:
                # will get errors if the document is encrypted.
                try:
                    data = zlib.decompress(data)
           ...

I haven't done any extensive tests yet, so this might not work in all cases ...

@dhdaines
Copy link
Contributor

dhdaines commented Nov 27, 2024

The problem is actually that the filter is an indirect object reference which isn't being resolved by pdfminer.six before checking if it is not a list (the decoding parameters aren't being resolved either). Solution is not as above but rather:

     def get_filters(self) -> List[Tuple[Any, Any]]:
-        filters = self.get_any(("F", "Filter"))
-        params = self.get_any(("DP", "DecodeParms", "FDecodeParms"), {})
+        filters = resolve1(self.get_any(("F", "Filter"), []))
+        params = resolve1(self.get_any(("DP", "DecodeParms", "FDecodeParms"), {}))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants