Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to be able to skip files/dirs on file.not_referenced_in_label check #1079

Open
rgdeen opened this issue Dec 3, 2024 · 3 comments
Assignees
Labels

Comments

@rgdeen
Copy link

rgdeen commented Dec 3, 2024

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Anyone running validate

πŸ’ͺ Motivation

The referential integrity check's feature of reporting files that are not referenced in the label is a useful and welcome addition. However, there are times when we intentionally have files that are not officially part of the archive. It would be really useful to be able to specify a list of files and/or directories that are skipped for this check (actually, skipped for ALL checks - so validate pretends they do not exist).

The particular use case motivating this is the MSL hybrid pds3/4 bundle. It has a number of files that are part of the pds3 archive but are not part of pds4. For example, various pds3 boilerplate files, as well as two of the three types of browse products. There are also .XML files in the EXTRAS dir that are not labels. Being able to specify dirs to ignore would prevent these from throwing warnings (or in the XML file case, really serious fatal errors since they're not even PDS4 labels).

Having thousands of such warnings is a problem because it effectively hides any unexpected warnings. As it is the file not referenced warning is useless for the MSL bundle.

Bonus would be wildcard support so we could also skip pds3 .LBL files wherever they occur. (in the MSL case the LBLs are referenced from the pds4 labels, but that's not always the case).

With a number of hybrid bundles due to come out in the very near future (few months), this will become increasingly important.

πŸ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

βš™οΈ Engineering Details

No response

πŸŽ‰ I&T

No response

@jordanpadams
Copy link
Member

@rgdeen as an interim solution, would a flag to turn off this check suffice? Or would you prefer to have some provide some explicit exclusions to the run?

@rgdeen
Copy link
Author

rgdeen commented Dec 3, 2024

Well we can always turn off referential integrity checking altogether. But then we lose a lot of useful functionality. A flag to turn off that check could help on an interim basis but I think we really want an exclusion list... that way we keep all functionality. There's also the second (admittedly corner) case where there were non-pds4 XML files which it totally barfed on... the exclusion would cover that too whereas turning off the not-referenced flag would not. And the case where we want to exclude all *.LBL will come up soon... we for certain have hybrid bundles in development that do not point to the pods label, which makes the case for a wildcard too (although this did not come up in the MSL case).

Summary:
Directory exclusion - ignore things like EXTRAS dirs (I don't want to ignore all of EXTRAS, just some of the subdirs) or CATALOG
Wildcard-based file exclusion - ignore things like *.LBL or pds3 voldesc.cat (single file is degenerative case of wildcard match)

@matthewtiscareno
Copy link

There are a number of reasons why a node might want to place non-PDS4 files into a directory alongside a PDS4 archive. Since PDS4 is agnostic as to file systems, this is totally allowed. I also agree that it's great for Validate to check for non-PDS4 files in the directory, but something like a .gitignore file would seem to be a good practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ToDo
Development

No branches or pull requests

3 participants