Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectangles are misrecognized as curves when they contain a redundant final h operator #1065

Open
dhdaines opened this issue Nov 25, 2024 · 0 comments · May be fixed by #1066
Open

Rectangles are misrecognized as curves when they contain a redundant final h operator #1065

dhdaines opened this issue Nov 25, 2024 · 0 comments · May be fixed by #1066

Comments

@dhdaines
Copy link
Contributor

dhdaines commented Nov 25, 2024

Conveniently, there is already a test PDF for this, which is the one for #1008 ... it contains a number of rectangles which are closed both with an explicit line segment and a final h operator. The problem here is that they won't be recognized as rectangles by pdfminer's layout analysis. To replicate:

pdf2txt.py --output_type xml samples/contrib/issue-1008-inline-ascii85.pdf  | grep rect

(there should be 6 of them, but there are currently 0)

This is almost certainly widespread in real-world PDFs (and not just ones created by ArcGIS) since h is defined as:

Close the current subpath by appending a straight line
segment from the current point to the starting point of the
subpath. If the current subpath is already closed, h shall do
nothing.

And thus there isn't anything wrong or non-conforming about including it at the end of an already-closed path. The fix is pretty simple, see dhdaines@28463f2 (PR forthcoming)

@dhdaines dhdaines linked a pull request Nov 25, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant