You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to partially extract text while maintaining the layout in HTMLConverter?
The text extraction in HTMLConverter yields very good results, including text grouping.
However, I want to extract specific parts of the PDF (such as the upper half).
In the output from HTMLConverter, the positional information of the text is lost.
When extracting elements with extract_pages, detailed information is obtained, but the text is not grouped and all becomes LTChar.
Is there a solution for such cases?
The text was updated successfully, but these errors were encountered:
Is there a way to partially extract text while maintaining the layout in
HTMLConverter
?The text extraction in
HTMLConverter
yields very good results, including text grouping.However, I want to extract specific parts of the PDF (such as the upper half).
In the output from
HTMLConverter
, the positional information of the text is lost.When extracting elements with
extract_pages
, detailed information is obtained, but the text is not grouped and all becomesLTChar
.Is there a solution for such cases?
The text was updated successfully, but these errors were encountered: