Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow two modules with same NSVCA but different snippet content. #3242

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

ipanova
Copy link
Member

@ipanova ipanova commented Sep 4, 2023

closes #3241

It no longer holds true to rely on NSVCA module's uniqueness, add the hash of the document snippet to it.

@ipanova ipanova marked this pull request as draft September 4, 2023 14:47
modules_to_update = []

for mmd in Modulemd.objects.all().only("snippet").iterator():
digest = hashlib.sha256(mmd.snippet.encode()).hexdigest()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggainey @dralley previous 2 migrations where written to account for the case when a module would have an empty snippet. Is this still the case we need to account for? In that case I'd need to re-write this part a bit, we will however still have a very slim chance for a migration to fail in case there would be two modules with the same NSVCA and both wold be having empty snippets because I will be using digest of an empty blob e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. It is hopefully less needed than it was at the time, but it might still be needed. This was part of why we were resaving module snippets in the first place. Ideally if that has been in place for a little while it will have been self-repairing.

On the other hand if there's one module with an empty snippet there could just as easily be more than one.

Maybe it's better to fail out early in this case w/ a report about what is wrong, and defer to data repair?

Copy link
Member Author

@ipanova ipanova Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was saying nonsense, given our current constraints we cannot have in DB 2 same NSVCA + empty snippet. When this migration will be running it will always be only one ( unique NSVCA + empty snippet).

I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync( non-optimized), so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably do nothing at this point since empty snippet can be fixed by re-running the sync, so then the old module will be removed and new module with -non-empty-snippet will be saved and added to the new RV

In theory, as long as the module hasn't disappeared, which IIRC Fedora was doing for a while. That is, older versions of streams would be purged. Or maybe I'm thinking of EPEL, but I remember the modularity group wasn't thrilled about it.

I'm not sure if this is worth worrying about yet though. Maybe it'll be fine?

@ipanova ipanova force-pushed the i3241 branch 2 times, most recently from b92efb8 to 20f77e4 Compare September 4, 2023 15:06
migrations.AlterUniqueTogether(
name="modulemd",
unique_together={
("_pulp_domain", "name", "stream", "version", "context", "arch", "digest")
Copy link
Contributor

@dralley dralley Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set this to just ("_pulp_domain", "digest") like modulemd-obsoletes and defaults?

Seems like the other fields wouldn't help much. However we probably should keep indexes for name and stream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could but that sort of breaks my plan how to mitigate modules with empty snippets?

Copy link
Member Author

@ipanova ipanova Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose to keep those modules in the repo anyway? how the published modules.yaml would look with such a module, it will be skipped right? so maybe we should a) take an action to identify such modules and repair by syncing and parsing the metadata ( costy, is it worth it?) b) identify such modules and remove them from the repo ( might be unexpected/unwanted action since it will produce new RV c) do nothing but that will require us keeping the unique_together as proposed in this PR NSVCA+digest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume by A you mean telling the user which repos to resync? Or do you mean doing it in the migration?

Copy link
Member Author

@ipanova ipanova Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant doing it in the migration which is not my preference. In order to run this migration successfully having ("_pulp_domain", "digest") the digest must be calculated. If there will be empty snippets the digest for those modules will be e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 and migration will fail with:

django.db.utils.IntegrityError: could not create unique index "rpm_modulemd__pulp_domain_id_digest_eb1e206a_uniq"
DETAIL:  Key (_pulp_domain_id, digest)=(018a6a12-6b49-797c-9f7b-32c3da09e5c0, e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is duplicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and B will not solve the problem as long as these kind of modules will stay in DB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do it in the migration. It may not even be reliable - what happens if the repo is gone, or the module isn't in the metadata anymore? It introduces a lot of variables that shouldn't be anywhere near the migration process.

Copy link
Contributor

@dralley dralley Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my personal preference is to let the migration fail, but spit out the details of which repositories and modules are impacted and how it can be fixed [0]. That will make it easy for users to work around the issue while not limiting us unnecessarily (since we don't know how big of an issue this will ultimately end up being).

I don't know that it's the perfect solution, but it feels like a decent compromise between overengineering and being inconsiderate to users. If it turns out to be more of an issue as Katello starts migrating then we can re-adjust our approach later.

[0] For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do it in the migration.

I agree with you on that, but listed this as on option anyway.

For instance, if it is present in the upstream repo then a resync could fix it. If it's not, then a mirror-sync combined with orphan cleanup could fix it. And the nuclear option is to just delete the repo, orphan cleanup, and resync, but probably one of the former two would work.

Just the repo removal, orphan clean up, resync would work.
Given the in place uniqueness constant of NSVCA on re-sync it would identify the module as an existing content unit and skip it.

@dralley
Copy link
Contributor

dralley commented Sep 4, 2023

I think if we don't do normalization we're going to want to make sure that modulemd metadata is stable between republishes for Fedora, CentOS, RHEL etc.

closes pulp#3241

It no longer holds true to rely on NSVCA module's uniqueness, add the
hash of the document snippet to it.
@ipanova ipanova marked this pull request as ready for review September 6, 2023 14:34
Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current state is the desired end state that we discussed, is it missing anything?

@dralley dralley requested a review from ggainey September 7, 2023 23:32
Copy link
Contributor

@ggainey ggainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concur - this matches my understanding of the outcome of the discussion we had in the rpm-team-mtg. Great work @ipanova !

@dralley
Copy link
Contributor

dralley commented Sep 8, 2023

Thanks Ina!

@dralley dralley merged commit 7a66a3e into pulp:main Sep 8, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module's NSVCA cannot be used as uniqueness constraint
3 participants