Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow suppression of (cid:N) in pdf2txt #1070

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dhdaines
Copy link
Contributor

@dhdaines dhdaines commented Nov 27, 2024

Closes: #1056

This is probably a feature people have wanted for a while. It may be a better idea to make it a configuration option but it is really not clear where this could go in the various devices and converters and such.

I would advise against putting it in LAParams because the simple fact of having an LAParams that is not None has wide-ranging side-effects on layout analysis.

It has a test :)

Checklist

  • I have read CONTRIBUTING.md.
  • I have added a concise human-readable description of the change to CHANGELOG.md.
  • I have tested that this fix is effective or that this feature works.
  • I have added docstrings to newly created methods and classes.
  • I have updated the README.md and the readthedocs documentation. Or verified that this is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text extraction issue with extract_text_to_fp - Uncleaned CID characters
1 participant