-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow two modules with same NSVCA but different snippet content. #3242
Conversation
modules_to_update = [] | ||
|
||
for mmd in Modulemd.objects.all().only("snippet").iterator(): | ||
digest = hashlib.sha256(mmd.snippet.encode()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggainey @dralley previous 2 migrations where written to account for the case when a module would have an empty snippet. Is this still the case we need to account for? In that case I'd need to re-write this part a bit, we will however still have a very slim chance for a migration to fail in case there would be two modules with the same NSVCA and both wold be having empty snippets because I will be using digest of an empty blob e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. It is hopefully less needed than it was at the time, but it might still be needed. This was part of why we were resaving module snippets in the first place. Ideally if that has been in place for a little while it will have been self-repairing.
On the other hand if there's one module with an empty snippet there could just as easily be more than one.
Maybe it's better to fail out early in this case w/ a report about what is wrong, and defer to data repair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying nonsense, given our current constraints we cannot have in DB 2 same NSVCA + empty snippet. When this migration will be running it will always be only one ( unique NSVCA + empty snippet).
I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync( non-optimized), so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync, so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV
In theory, as long as the module hasn't disappeared, which IIRC Fedora was doing for a while. That is, older versions of streams would be purged. Or maybe I'm thinking of EPEL, but I remember the modularity group wasn't thrilled about it.
I'm not sure if this is worth worrying about yet though. Maybe it'll be fine?
b92efb8
to
20f77e4
Compare
migrations.AlterUniqueTogether( | ||
name="modulemd", | ||
unique_together={ | ||
("_pulp_domain", "name", "stream", "version", "context", "arch", "digest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set this to just ("_pulp_domain", "digest")
like modulemd-obsoletes and defaults?
Seems like the other fields wouldn't help much. However we probably should keep indexes for name and stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could but that sort of breaks my plan how to mitigate modules with empty snippets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose to keep those modules in the repo anyway? how the published modules.yaml would look with such a module, it will be skipped right? so maybe we should a) take an action to identify such modules and repair by syncing and parsing the metadata ( costy, is it worth it?) b) identify such modules and remove them from the repo ( might be unexpected/unwanted action since it will produce new RV c) do nothing but that will require us keeping the unique_together as proposed in this PR NSVCA+digest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume by A you mean telling the user which repos to resync? Or do you mean doing it in the migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant doing it in the migration which is not my preference. In order to run this migration successfully having ("_pulp_domain", "digest")
the digest must be calculated. If there will be empty snippets the digest for those modules will be e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
and migration will fail with:
django.db.utils.IntegrityError: could not create unique index "rpm_modulemd__pulp_domain_id_digest_eb1e206a_uniq"
DETAIL: Key (_pulp_domain_id, digest)=(018a6a12-6b49-797c-9f7b-32c3da09e5c0, e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and B will not solve the problem as long as these kind of modules will stay in DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do it in the migration. It may not even be reliable - what happens if the repo is gone, or the module isn't in the metadata anymore? It introduces a lot of variables that shouldn't be anywhere near the migration process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my personal preference is to let the migration fail, but spit out the details of which repositories and modules are impacted and how it can be fixed [0]. That will make it easy for users to work around the issue while not limiting us unnecessarily (since we don't know how big of an issue this will ultimately end up being).
I don't know that it's the perfect solution, but it feels like a decent compromise between overengineering and being inconsiderate to users. If it turns out to be more of an issue as Katello starts migrating then we can re-adjust our approach later.
[0] For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do it in the migration.
I agree with you on that, but listed this as on option anyway.
For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.
Just the repo removal, orphan clean up, resync would work.
Given the in place uniqueness constant of NSVCA on re-sync it would identify the module as an existing content unit and skip it.
I think if we don't do normalization we're going to want to make sure that modulemd metadata is stable between republishes for Fedora, CentOS, RHEL etc. |
closes pulp#3241 It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the current state is the desired end state that we discussed, is it missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur - this matches my understanding of the outcome of the discussion we had in the rpm-team-mtg. Great work @ipanova !
Thanks Ina! |
closes #3241
It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.