Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to show occurrences of empty elements? #224

Open
jzuidweg opened this issue Jul 14, 2022 · 4 comments
Open

Is it possible to show occurrences of empty elements? #224

jzuidweg opened this issue Jul 14, 2022 · 4 comments

Comments

@jzuidweg
Copy link

Test document 15 fails the following three errors concerning empty elements:

  • 4.1.2 test 16 (Paragraph structure element is empty)
  • 4.1.2 test 17 (Span structure element is empty)
  • 4.1.2 test 20 (Numbered heading structure element is empty

The GUI does not pinpoint the occurrences of these errors in the document, however.

Is it possible to show (at least approximately) where errors about empty elements occur in the document?

@bdoubrov
Copy link
Collaborator

The problem here is that this error occurs when the logical tag is empty, that is not associated with any content on the page as all. The only reasonable hint would be to show the location of the previous and next tags in the structure tree.

One other option would be to provide the path to this empty element in the structure tree in the additional info of the error message, something like:
/Document/P[15]
which would mean 15-th <P> tag in the document.

@jzuidweg
Copy link
Author

Both options you mention would be useful:

  • The first option (aproximating the location in the document) would work for non technical users of the tool
  • Providing the path would be good for technical users of the tool who understand PDF structure

I notice that the PDF checker is used by both types of user, so perhaps we should implement both...

@bdoubrov
Copy link
Collaborator

The path is already available in the context of the error message. For example, the context for the second empty span is:

root/doc[0]/StructTreeRoot[0]/children[0](392 0 obj Document Document)/children[168](561 0 obj H1 H1)/children[0](717 0 obj Span Span)

Which translates to the following path in the Structure tree:
Document/H1[168]/Span[0]

or in plain words one would need to go for

  • the root Document structure element
  • then its child structure element with index 168 (0-based), which will be of type H1
  • then its child structure element with index 0, which will be Span

And this particular Span has no content on the page.

@bdoubrov
Copy link
Collaborator

The latest dev version of the error preview has a new right pane showing the structure tree of the document. In particular, it highlights empty structure elements and allows to navigate through the sibling elements, which will also be highlighted on the page giving the hint where exactly this empty structure element is located in the document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants