Im newbie. #113

Mohanrajkarnan · 2021-02-22T17:36:43Z

I have requirement of extract pdf to Html5.

I have tried the below code which was able to extract text from pdf and created html but not structured as in pdf.
-Missed images
-Missed text positioning.

pdftotree.parse(pdf_file,html_path=htmlPath, favor_figures=True,model_type=None, model_path=None,visualize=False)

Please assist me as what am i missing.

Thanks
Mohan

lukehsiao · 2021-02-23T00:58:48Z

Hi Mohan,

It sounds like you're trying to get an HTML representation that focuses on visually looking like the source PDF, is that correct? If so, pdftotree most likely isn't for you. The focus here is more on structural accuracy (e.g., tables end up in HTML tables), not faithfully representing a PDF document visually. Many PDF to HTML tools have a similar focus.

If my assumption is correct, then I'd suggest trying some other tools. I think pdftohtml.org is one that emphasizes visual accuracy, but I'm sure there are others as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Im newbie. #113

Im newbie. #113

Mohanrajkarnan commented Feb 22, 2021 •

edited

Loading

lukehsiao commented Feb 23, 2021 •

edited

Loading

Im newbie. #113

Im newbie. #113

Comments

Mohanrajkarnan commented Feb 22, 2021 • edited Loading

lukehsiao commented Feb 23, 2021 • edited Loading

Mohanrajkarnan commented Feb 22, 2021 •

edited

Loading

lukehsiao commented Feb 23, 2021 •

edited

Loading