You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried the below code which was able to extract text from pdf and created html but not structured as in pdf.
-Missed images
-Missed text positioning.
It sounds like you're trying to get an HTML representation that focuses on visually looking like the source PDF, is that correct? If so, pdftotree most likely isn't for you. The focus here is more on structural accuracy (e.g., tables end up in HTML tables), not faithfully representing a PDF document visually. Many PDF to HTML tools have a similar focus.
If my assumption is correct, then I'd suggest trying some other tools. I think pdftohtml.org is one that emphasizes visual accuracy, but I'm sure there are others as well.
I have requirement of extract pdf to Html5.
I have tried the below code which was able to extract text from pdf and created html but not structured as in pdf.
-Missed images
-Missed text positioning.
pdftotree.parse(pdf_file,html_path=htmlPath, favor_figures=True,model_type=None, model_path=None,visualize=False)
Please assist me as what am i missing.
Thanks
Mohan
The text was updated successfully, but these errors were encountered: