Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(seer grouping): Add debug logging for old excess frames check #83082

Merged

Conversation

lobsterkatie
Copy link
Member

As of #82414, we should no longer be using TooManyOnlySystemFramesException, but the fact that we're still getting did_call_seer metrics tagged with "over-threshold-frames" says that we still are, somehow. This is currently a blocker on merging #82434, which removes TooManyOnlySystemFramesException and the code associated with it, so it'd be nice to figure out why it's happening so we can fix it.

Try as I might, I can't replicate it, nor can I find the flaw in the logic, so this adds a temporary log to try to help figure out how in the heck it's happening. It also adds a record_metrics parameter to the has_too_many_contributing_frames helper, so that I can call it for purposes of the logging without messing up the metrics it collects. Once I figure it out what's happening, I'll back out all of these changes.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 8, 2025
@lobsterkatie lobsterkatie marked this pull request as ready for review January 8, 2025 11:00
@lobsterkatie lobsterkatie requested a review from a team as a code owner January 8, 2025 11:00
"event_id": event.event_id,
"project_id": event.project_id,
"platform": event.platform,
"has_too_many_frames_result": has_too_many_contributing_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that this isn't expensive to run here? I assume this log happens only on group creation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs on every event, but no, it's not expensive - all it does is check a few different pre-computed values (and collect metrics, if record_metrics is true, but in this case it's false, so that's a moot point).

def has_too_many_contributing_frames(
event: Event | GroupEvent,
variants: dict[str, BaseVariant],
referrer: ReferrerOptions,
record_metrics: bool = True,
) -> bool:
platform = event.platform
shared_tags = {"referrer": referrer.value, "platform": platform}
contributing_variant, contributing_component = get_contributing_variant_and_component(variants)
# Ideally we're calling this function after we already know the event both has a stacktrace and
# is using it for grouping (in which case none of the below conditions should apply), but still
# worth checking that we have enough information to answer the question just in case
if (
# Fingerprint, checksum, fallback variants
not isinstance(contributing_variant, ComponentVariant)
# Security violations, log-message-based grouping
or contributing_variant.variant_name == "default"
# Any ComponentVariant will have this, but this reassures mypy
or not contributing_component
# Exception-message-based grouping
or not hasattr(contributing_component, "frame_counts")
):
# We don't bother to collect a metric on this outcome, because we shouldn't have called the
# function in the first place
return False
# Certain platforms were backfilled before we added this filter, so to keep new events matching
# with the existing data, we turn off the filter for them (instead their stacktraces will be
# truncated)
if platform in EVENT_PLATFORMS_BYPASSING_FRAME_COUNT_CHECK:
if record_metrics:
metrics.incr(
"grouping.similarity.frame_count_filter",
sample_rate=options.get("seer.similarity.metrics_sample_rate"),
tags={**shared_tags, "outcome": "bypass"},
)
return False
stacktrace_type = "in_app" if contributing_variant.variant_name == "app" else "system"
key = f"{stacktrace_type}_contributing_frames"
shared_tags["stacktrace_type"] = stacktrace_type
if contributing_component.frame_counts[key] > MAX_FRAME_COUNT:
if record_metrics:
metrics.incr(
"grouping.similarity.frame_count_filter",
sample_rate=options.get("seer.similarity.metrics_sample_rate"),
tags={**shared_tags, "outcome": "block"},
)
return True
if record_metrics:
metrics.incr(
"grouping.similarity.frame_count_filter",
sample_rate=options.get("seer.similarity.metrics_sample_rate"),
tags={**shared_tags, "outcome": "pass"},
)
return False

If you still think it's an issue, I can move the call to where the log is, which would drastically cut down the number of times that it runs. I originally had it that way, but in the end, especially since it's hopefully only going to be in place for a day or two while I debug this, it just seemed easier to pass one metadata blob rather than all the stuff needed to compute said blob's values. LMK if you want me to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that as long as it won't cause a performance issue in prod I'm fine with it as is

@lobsterkatie lobsterkatie merged commit 080c483 into master Jan 8, 2025
50 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-add-debug-log-for-old-excess-frames-behavior branch January 8, 2025 19:04
lobsterkatie added a commit that referenced this pull request Jan 9, 2025
…ehavior log (#83150)

This is a follow-up to #83082, which added a temporary log to help debug how we're still landing in the old excess-frame-check logic. Some logs have come through and a) the stacktrace is what you'd expect, so we can stop sending it, and b) the new check is giving a false negative, so now we need to see the data on which it's basing its decision (the frame counts). 

Interestingly, so far all of the events in question come from the same project. I spot-checked one event and it looked normal... We'll see what this new log data turns up.
andrewshie-sentry pushed a commit that referenced this pull request Jan 22, 2025
…83082)

As of #82414, we should no longer be using `TooManyOnlySystemFramesException`, but the fact that we're still getting `did_call_seer` metrics tagged with `"over-threshold-frames"` says that we still are, somehow. This is currently a blocker on merging #82434, which removes `TooManyOnlySystemFramesException` and the code associated with it, so it'd be nice to figure out why it's happening so we can fix it.

Try as I might, I can't replicate it, nor can I find the flaw in the logic, so this adds a temporary log to try to help figure out how in the heck it's happening. It also adds a `record_metrics` parameter to the `has_too_many_contributing_frames` helper, so that I can call it for purposes of the logging without messing up the metrics it collects. Once I figure it out what's happening, I'll back out all of these changes.
andrewshie-sentry pushed a commit that referenced this pull request Jan 22, 2025
…ehavior log (#83150)

This is a follow-up to #83082, which added a temporary log to help debug how we're still landing in the old excess-frame-check logic. Some logs have come through and a) the stacktrace is what you'd expect, so we can stop sending it, and b) the new check is giving a false negative, so now we need to see the data on which it's basing its decision (the frame counts). 

Interestingly, so far all of the events in question come from the same project. I spot-checked one event and it looked normal... We'll see what this new log data turns up.
@github-actions github-actions bot locked and limited conversation to collaborators Jan 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants