Allow two modules with same NSVCA but different snippet content. #3242

ipanova · 2023-09-04T14:47:33Z

It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.

ipanova · 2023-09-04T14:56:43Z

pulp_rpm/app/migrations/0052_modulemd_digest.py

+    modules_to_update = []
+
+    for mmd in Modulemd.objects.all().only("snippet").iterator():
+        digest = hashlib.sha256(mmd.snippet.encode()).hexdigest()


@ggainey @dralley previous 2 migrations where written to account for the case when a module would have an empty snippet. Is this still the case we need to account for? In that case I'd need to re-write this part a bit, we will however still have a very slim chance for a migration to fail in case there would be two modules with the same NSVCA and both wold be having empty snippets because I will be using digest of an empty blob e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

It depends. It is hopefully less needed than it was at the time, but it might still be needed. This was part of why we were resaving module snippets in the first place. Ideally if that has been in place for a little while it will have been self-repairing.

On the other hand if there's one module with an empty snippet there could just as easily be more than one.

Maybe it's better to fail out early in this case w/ a report about what is wrong, and defer to data repair?

I was saying nonsense, given our current constraints we cannot have in DB 2 same NSVCA + empty snippet. When this migration will be running it will always be only one ( unique NSVCA + empty snippet).

I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync( non-optimized), so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV

I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync, so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV

In theory, as long as the module hasn't disappeared, which IIRC Fedora was doing for a while. That is, older versions of streams would be purged. Or maybe I'm thinking of EPEL, but I remember the modularity group wasn't thrilled about it.

I'm not sure if this is worth worrying about yet though. Maybe it'll be fine?

dralley · 2023-09-04T21:07:08Z

pulp_rpm/app/migrations/0052_modulemd_digest.py

+        migrations.AlterUniqueTogether(
+            name="modulemd",
+            unique_together={
+                ("_pulp_domain", "name", "stream", "version", "context", "arch", "digest")


Should we set this to just ("_pulp_domain", "digest") like modulemd-obsoletes and defaults?

Seems like the other fields wouldn't help much. However we probably should keep indexes for name and stream.

we could but that sort of breaks my plan how to mitigate modules with empty snippets?

what is the purpose to keep those modules in the repo anyway? how the published modules.yaml would look with such a module, it will be skipped right? so maybe we should a) take an action to identify such modules and repair by syncing and parsing the metadata ( costy, is it worth it?) b) identify such modules and remove them from the repo ( might be unexpected/unwanted action since it will produce new RV c) do nothing but that will require us keeping the unique_together as proposed in this PR NSVCA+digest

I assume by A you mean telling the user which repos to resync? Or do you mean doing it in the migration?

I meant doing it in the migration which is not my preference. In order to run this migration successfully having ("_pulp_domain", "digest") the digest must be calculated. If there will be empty snippets the digest for those modules will be e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 and migration will fail with:

django.db.utils.IntegrityError: could not create unique index "rpm_modulemd__pulp_domain_id_digest_eb1e206a_uniq" DETAIL: Key (_pulp_domain_id, digest)=(018a6a12-6b49-797c-9f7b-32c3da09e5c0, e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is duplicated.

and B will not solve the problem as long as these kind of modules will stay in DB

I don't think we can do it in the migration. It may not even be reliable - what happens if the repo is gone, or the module isn't in the metadata anymore? It introduces a lot of variables that shouldn't be anywhere near the migration process.

I think my personal preference is to let the migration fail, but spit out the details of which repositories and modules are impacted and how it can be fixed [0]. That will make it easy for users to work around the issue while not limiting us unnecessarily (since we don't know how big of an issue this will ultimately end up being).

I don't know that it's the perfect solution, but it feels like a decent compromise between overengineering and being inconsiderate to users. If it turns out to be more of an issue as Katello starts migrating then we can re-adjust our approach later.

[0] For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.

I don't think we can do it in the migration.

I agree with you on that, but listed this as on option anyway.

For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.

Just the repo removal, orphan clean up, resync would work.
Given the in place uniqueness constant of NSVCA on re-sync it would identify the module as an existing content unit and skip it.

pulp_rpm/app/models/modulemd.py

dralley · 2023-09-04T21:40:36Z

I think if we don't do normalization we're going to want to make sure that modulemd metadata is stable between republishes for Fedora, CentOS, RHEL etc.

closes pulp#3241 It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.

dralley

I believe the current state is the desired end state that we discussed, is it missing anything?

ggainey

Concur - this matches my understanding of the outcome of the discussion we had in the rpm-team-mtg. Great work @ipanova !

dralley · 2023-09-08T15:12:41Z

Thanks Ina!

ipanova marked this pull request as draft September 4, 2023 14:47

ipanova commented Sep 4, 2023

View reviewed changes

ipanova force-pushed the i3241 branch 2 times, most recently from b92efb8 to 20f77e4 Compare September 4, 2023 15:06

dralley reviewed Sep 4, 2023

View reviewed changes

pulp_rpm/app/models/modulemd.py Show resolved Hide resolved

Allow two modules with same NSVCA but different snippet content.

c7aee9d

closes pulp#3241 It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.

ipanova force-pushed the i3241 branch from 20f77e4 to c7aee9d Compare September 6, 2023 11:29

ipanova marked this pull request as ready for review September 6, 2023 14:34

dralley approved these changes Sep 7, 2023

View reviewed changes

dralley requested a review from ggainey September 7, 2023 23:32

ggainey approved these changes Sep 8, 2023

View reviewed changes

dralley merged commit 7a66a3e into pulp:main Sep 8, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow two modules with same NSVCA but different snippet content. #3242

Allow two modules with same NSVCA but different snippet content. #3242

ipanova commented Sep 4, 2023

ipanova Sep 4, 2023

dralley Sep 4, 2023

ipanova Sep 5, 2023 •

edited

Loading

dralley Sep 6, 2023

dralley Sep 4, 2023 •

edited

Loading

ipanova Sep 5, 2023

ipanova Sep 5, 2023 •

edited

Loading

dralley Sep 6, 2023

ipanova Sep 6, 2023 •

edited

Loading

ipanova Sep 6, 2023

dralley Sep 6, 2023

dralley Sep 7, 2023 •

edited

Loading

ipanova Sep 7, 2023

dralley commented Sep 4, 2023

dralley left a comment

ggainey left a comment

dralley commented Sep 8, 2023

Allow two modules with same NSVCA but different snippet content. #3242

Allow two modules with same NSVCA but different snippet content. #3242

Conversation

ipanova commented Sep 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley Sep 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley commented Sep 4, 2023

dralley left a comment

Choose a reason for hiding this comment

ggainey left a comment

Choose a reason for hiding this comment

dralley commented Sep 8, 2023

ipanova Sep 5, 2023 •

edited

Loading

dralley Sep 4, 2023 •

edited

Loading

ipanova Sep 5, 2023 •

edited

Loading

ipanova Sep 6, 2023 •

edited

Loading

dralley Sep 7, 2023 •

edited

Loading