-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate run with --skip-product-validation reports product success in log #1090
Comments
Ah, yes. Our imprecise, at best, use of "product validation" is coming to roost. Basically you have three levels of validation that can take place:
All of these are referred to as "product validation" without distinction. We could change the summary to say (respective to list above):
To differentiate. Could use any words you wanted really, but it is not terribly hard once you pick the words you want. I know product is also overly abused and imprecise at best so not sure how best to word it. |
Is #1 already called content validation? Or is that an additional level to check the actual data? All three actually run schema and schematron? I thought --skip-product-validation would not do that. What else is it doing, I thought sch/tron was the bulk of the work? Anyway, here's the use case at least for us. First, content validation (actually checking the data) is so slow we often don't bother, or let it run concurrently to the release process just in case. So that leaves two things: label validation and bundle validation (aka referential integrity checks). The distinction is that the label validation can be run in parallel, because each product is validated separately. So whether it's multiple threads or Kubernetes or Nucleus or whatever, we can beat the time down to some extent by divide-and-conquer. That's not true for bundle validation, which has to be run monolithically. So what I really want is a mode that does just bundle validation but nothing else (or the minimum else), on the assumption that we've already checked the individual labels. Then mode for the standard parallelizeable label-but-no-data. I don't think I have a use case for splitting the label part into those three pieces. It's that bundle-level one that was the issue here... making it look like label checking was done when really it was turned off (and it's easy to forget what was run when and where... I know the command echo says no product val in this case but that's easy to miss when looking at the logs). I guess the bigger picture item is, to do the absolute minimum to each label if you're doing bundle checking only (with product val turned off). |
Well, content check is a level of product check depending on the context of product. If just the label was product, then we would not need the word product. If a label was just the XML and not all the references and the semantics the XML offers, then we would not need the word label either. It is why the table shows full product validation is schema/tron + label + content. Whether full modifies product or validation or both depends on who is in the room and larger discussion context. Yes, the schema/schematron is always done because it is part of loading the file. It is not the bulk of the what validate does. In fact, it is a sliver. In fact, validate is not allowed to check anything already being done by the schema/schematron. Checking a label does stuff like non overlapping bounds etc without actually reading the data file to see if it complies with the description. Basically, stuff beyond schematron but frequently caused problems because we have some product (label + data) that is wrong and not detected by schema/schematron. Really, if schema/schematron were it, then using xmllint would be a billion times better and faster. As to rest, I agree that abusing "product validation" is a problem. The devil's advocate would say if you read the whole log it includes the switches used and seeing the skip flag(s) tells you exactly what "product validation" means. Thus, your whole argument is reduced to you do not want to read the whole log file or two tiny bits of it anyway - the abstract where the switches are and the summary where what passed and failed is. |
Yes, and I said as much. The reality is people will miss the switch and make a bad assumption, that's just human nature. Anyway, I guess the request under the request is to separate out that which is parallelizable (per product or dir of products) and that which isn't, and don't do any more of the former than you have to when doing the latter with the appropriate switches on. |
@al-niessner @rgdeen do we want to change the flag or just change the log output? I say we go with the latter and see if people are confused. If I do agree we probably should have 3 separate sections to our output logs label validation, content validation, and bundle integrity/collection integrity validation, but I don't think it is worth introducing a backwards incompatible change like that at this time. |
Sounds good to me, thanks. |
Here is the annoying bit. Total products is tied to the number of pass/fail. If you count the products as skipped, then the total goes to 0 and you get this:
The error is a false positive and a result of the total being passed + failed not a real total of products seen. Question now is, should skipped be included in total? |
Checked for duplicates
No - I haven't checked
🐛 Describe the bug
Ran validate like this:
validate /path/msl_navcam_processed -R pds4.bundle -D --skip-context-validation --skip-product-validation -v 2 -r /path/valrpt.txt
The valrpt output looked like this:
Notice that it says Product Validations completed. But, they weren't - the timing was such that it could not have done the product vals - and if it did, then the --skip-product-validation flag is broken).
This can cause an issue in that later we might look at the log, see that everything passed product validation, and release the data even though something might fail validation.
It should not report Product Validation passed unless it actually did.
🕵️ Expected behavior
see above
📜 To Reproduce
see above, should happen with any bundle
🖥 Environment Info
linux, RH 7.6
📚 Version of Software Used
validate 3.6.3
🩺 Test Data / Additional context
No response
🦄 Related requirements
🦄 #xyz
Acceptance Criteria
Given
When I perform
Then I expect
⚙️ Engineering Details
No response
🎉 Integration & Test
No response
The text was updated successfully, but these errors were encountered: