Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed XML goes to stdout/stderr not to report file #1081

Open
rgdeen opened this issue Dec 5, 2024 · 3 comments
Open

Malformed XML goes to stdout/stderr not to report file #1081

rgdeen opened this issue Dec 5, 2024 · 3 comments
Assignees
Labels
B15.1 bug Something isn't working s.low

Comments

@rgdeen
Copy link

rgdeen commented Dec 5, 2024

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

We had some malformed XML in a collection label. The validate -r report file kept saying the collection wasn't found, but there was no indication what the problem was.

When rerunning it we saw this:

[alanis@mslpds1 bin]$ ./validate /export/mslwork/msl_hazcam_raw -R pds4.bundle -D -v 2 -r /home/alanis/msl_bundle_labels/msl_hazcam_raw_test4.txt &
[1] 3172282
[alanis@mslpds1 bin]$ Error on line 76 column 11 of collection_data.xml:
 SXXP0003  Error reported by XML parser: The element type "Internal_Reference" must be
 terminated by the matching end-tag "</Internal_Reference>".
Error on line 76 column 11 of collection_browse.xml:
 SXXP0003  Error reported by XML parser: The element type "Internal_Reference" must be
 terminated by the matching end-tag "</Internal_Reference>".

So the malformed XML messages were coming out to stdout or stderr, NOT to the report file!!! Since stdout/err wasn't captured on the first run, this information was lost.

🕵️ Expected behavior

I expected malformed XML errors to show up as errors in the report file.

📜 To Reproduce

See above. Take a small bundle, mangle the collection label (it had /Internal Reference missing an underscore), and do a validate with -r . The report file will not contain the malformed XML errors.

🖥 Environment Info

Validate version 3.5.2
Linux

📚 Version of Software Used

validate version 3.5.2

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

Acceptance Criteria

Given
When I perform
Then I expect

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

@al-niessner
Copy link
Contributor

Sorry, missed read PDF for XML. We do not put exceptions in the report file just "validation" errors.

We do not know which files are XML. Think of it this way, you tell validate that foo.txt is XML and foo.txt is a simple paragraph of text describing the sunrise at the beach. Which of the million XML errors should be part of the report or just the simple exception that the text is not really XML. What error: Did you mean foo.xml or did someone put the wrong content in foo.txt? From validate's perspective, the file is just not XML - an exception - which cannot be processed any further. When you are scanning a directory for products, validate reads a lot of those kinds of files and would give you a plethora of the errors you are requesting.

@rgdeen
Copy link
Author

rgdeen commented Dec 9, 2024

You should report the fact that the file is malformed XML as a standard validate error. Not necessarily every error from the file (or even ANY of them), but the simple fact that it HAD an error is critical. (and how do you not know it's xml, you have either the extension or the fact that you are processing it AS a label to easily tell this!).

Most people in my experience look at the validate report, to see the summary of # of pass/fail/warn at the end. If there's malformed xml, you'd never know it looking at that summary. This can easily lead to false passes of bad data. Presumably some other error would get triggered, but not necessarily.

Bottom line, -r is billed as the report file name, default is stdout. But it is NOT if it omits this very critical information. At the very least, the help needs updating to say something like "oh by the way you also have to look at stderr and can't trust that 100% pass in the -r report actually means 100% pass".

It is bad form to throw an exception for a non-exceptional situation especially when the whole point of the program is to report errors. Forgetting (or in this case, misspelling) a </> tag is the sort of thing that validate should be designed to find for you. It absolutely should trap the exception and report it.

@al-niessner
Copy link
Contributor

To start, I am not telling anyone what validate should or should not do. I am simply telling you what it does and partially some of the why (the bits that I remember). If that is enough to satisfy you on how to do a workaround -- scan something for magic words, always capture the output, run xmllint before validate because xmllint is way better at syntax than validate, whatever inventive thing you want to do -- then all the better. If not, then press onward with getting the changes you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B15.1 bug Something isn't working s.low
Projects
Status: ToDo
Status: ToDo
Development

No branches or pull requests

3 participants