Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(grouping): Fix unmerged seer grouphash integrity error #83081

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Jan 8, 2025

We've been getting occasional integrity errors when trying to delete grouphash records, because of the seer_matched_grouphash foreign key in the GroupHashMetadata model. Normally, the process works like this:

  1. An event creates grouphash_A and group_1.
  2. A new event creates grouphash_B, which Seer then matches to grouphash_A. The new event goes in group_1, and grouphash_B.metadata.seer_matched_grouphash is set to `grouphash_A.
  3. group_1 ages out, and we delete it. In this process deletions happen from the bottom up, so first the GroupHashMetadata records for both grouphash_A and grouphash_B are deleted, then ``grouphash_Aandgrouphash_B` are themselves deleted, and finally `group_1` is deleted. Because of that ordering, the `grouphash_B.metadata.seer_matched_grouphash` link to `grouphash_A` disappears before we try to delete `grouphash_A`, and no integrity errors are thrown.

However, occasionally it goes like this:

  1. Same as above
  2. Same as above
  3. The user decides they disagree with Seer, and unmerges grouphash_B from group_1, such that it now points to group_2, which is newer than group_1 and therefore doesn't expire when group_1 does.
  4. group_1 ages out, and deletion happens as before, except this time it's only the GroupHashMetadata record for grouphash_A which gets deleted ahead of grouphash_A's deletion. So even after grouphash_A is deleted, grouphash_B.metadata.seer_matched_grouphash still points to it, and boom: you've got yourself an integrity error.

This fixes the problem by updating the on_delete setting for the seer_matched_grouphash field to be SET_NULL, so that Django will automatically break the foreign key link whenever a seer_matched_grouphash is deleted. I chose to have that happen during deletion rather than unmerging because when unmerges happen it's nice for debugging to be able to compare Seer's result to the human-mediated one.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 8, 2025
@lobsterkatie lobsterkatie force-pushed the kmclb-fix-seer-matched-grouphash-deletion-bug branch from f9c3bb1 to 4643f58 Compare January 8, 2025 02:44
@lobsterkatie lobsterkatie marked this pull request as ready for review January 8, 2025 03:32
@JoshFerge
Copy link
Member

would it be better to set on_delete=models.SET_NULL here instead?

"sentry.GroupHash", related_name="seer_matchees", on_delete=models.DO_NOTHING, null=True

@lobsterkatie
Copy link
Member Author

lobsterkatie commented Jan 8, 2025

would it be better to set on_delete=models.SET_NULL here instead?

Good question. There's some disconnect between the on_delete value and what actually happens during deletion, though - setting on_delete=models.CASCADE for example doesn't actually work (which is why fixes like this one are necessary). But lemme try that and see if it works.

UPDATE: That worked! Have updated the PR.

@lobsterkatie lobsterkatie force-pushed the kmclb-fix-seer-matched-grouphash-deletion-bug branch from 4643f58 to a9a9823 Compare January 8, 2025 23:03
@lobsterkatie lobsterkatie requested a review from a team as a code owner January 8, 2025 23:03
Copy link
Contributor

github-actions bot commented Jan 8, 2025

This PR has a migration; here is the generated SQL for src/sentry/migrations/0808_change_grouphash_metadata_seer_matched_grouphash_deletion_config.py ()

--
-- Alter field seer_matched_grouphash on grouphashmetadata
--
-- (no-op)

@@ -78,7 +78,7 @@ class GroupHashMetadata(Model):
seer_model = models.CharField(null=True)
# The `GroupHash` record representing the match Seer sent back as a match (if any)
seer_matched_grouphash = FlexibleForeignKey(
"sentry.GroupHash", related_name="seer_matchees", on_delete=models.DO_NOTHING, null=True
"sentry.GroupHash", related_name="seer_matchees", on_delete=models.SET_NULL, null=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, there shouldn't be too many GroupHashMetadata per GroupHash, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically possible for it to be a large number, but most often it's not.

Also, I assume (?) that the null setting happens only when delete is called on the linked object, yes? If so, then in cases where there's been no manual unmerge (the 99.9% case), all of the GroupHashMetadata records pointing at the GroupHash instance will already have been deleted by the time we get to the null setting, as they'll belong to sibling GroupHash records of the GroupHash record in question. (See point 3 in the first list in the PR description.) So only in the 0.1% case will there even be any remaining GroupHashMetadata records on which to update the seer_matched_grouphash value.

Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@lobsterkatie lobsterkatie merged commit 6242314 into master Jan 9, 2025
50 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-fix-seer-matched-grouphash-deletion-bug branch January 9, 2025 18:24
andrewshie-sentry pushed a commit that referenced this pull request Jan 22, 2025
We've been getting occasional integrity errors when trying to delete `grouphash` records, because of the `seer_matched_grouphash` foreign key in the `GroupHashMetadata` model. Normally, the process works like this:

1. An event creates `grouphash_A` and `group_1`.
2. A new event creates `grouphash_B`, which Seer then matches to `grouphash_A`. The new event goes in `group_1`, and `grouphash_B.metadata.seer_matched_grouphash` is set to `grouphash_A.
3. `group_1` ages out, and we delete it. In this process deletions happen from the bottom up, so first the `GroupHashMetadata` records for both `grouphash_A` and `grouphash_B` are deleted, then ``grouphash_A` and `grouphash_B` are themselves deleted, and finally `group_1` is deleted. Because of that ordering, the `grouphash_B.metadata.seer_matched_grouphash` link to `grouphash_A` disappears before we try to delete `grouphash_A`, and no integrity errors are thrown.

However, occasionally it goes like this:

1. Same as above
2. Same as above
3. The user decides they disagree with Seer, and unmerges `grouphash_B` from `group_1`, such that it now points to `group_2`, which is newer than `group_1` and therefore doesn't expire when `group_1` does.
4. `group_1` ages out, and deletion happens as before, except this time it's only the `GroupHashMetadata` record for `grouphash_A` which gets deleted ahead of `grouphash_A`'s deletion. So even after `grouphash_A` is deleted, `grouphash_B.metadata.seer_matched_grouphash` still points to it, and boom: you've got yourself an integrity error.

This fixes the problem by updating the `on_delete` setting for the `seer_matched_grouphash` field to be `SET_NULL`, so that Django will automatically break the foreign key link whenever a `seer_matched_grouphash` is deleted.  I chose to have that happen during deletion rather than unmerging because when unmerges happen it's nice for debugging to be able to compare Seer's result to the human-mediated one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants