-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratelimit metrics are not working as expected in v1.3.0 #3444
Comments
Hey, so this is the commit that did the rename 1848253 ("exporter: Rename ratelimit drops counter"), and here's the full PR that includes some other changes #2890 around that rename. This commit moves the place of incrementation but very slightly: 5231093 ("Move ratelimitmetrics inside pkg/exporter"). Here's an extract from the diff: it doesn't fundamentally change the code, instead of being called in the diff --git a/pkg/exporter/exporter.go b/pkg/exporter/exporter.go
index bc8bf0729..2d21a0e70 100644
--- a/pkg/exporter/exporter.go
+++ b/pkg/exporter/exporter.go
@@ -56,6 +56,7 @@ func (e *Exporter) Start() error {
func (e *Exporter) Send(event *tetragon.GetEventsResponse) error {
if e.rateLimiter != nil && !e.rateLimiter.Allow() {
e.rateLimiter.Drop()
+ rateLimitDropped.Inc()
return nil
}
diff --git a/pkg/ratelimit/ratelimit.go b/pkg/ratelimit/ratelimit.go
index 404395225..52c5512a4 100644
--- a/pkg/ratelimit/ratelimit.go
+++ b/pkg/ratelimit/ratelimit.go
@@ -78,5 +77,4 @@ func (r *RateLimiter) reportRateLimitInfo(encoder encoder.EventEncoder) {
func (r *RateLimiter) Drop() {
atomic.AddUint64(&r.dropped, 1)
- ratelimitmetrics.RateLimitDropped.Inc()
} My questions would be: did this ever work for you (seems like yes since you mention an upgrade)? If yes, for which version/commit? And how do you test this so we can try to reproduce on our side? Thanks! |
We were on 1.1.2 previously this week, and turns out this metric was also 0 at that point as well. Seems as if the metric |
IIRC, I don't think we were using ratelimit filters while on 1.1.2 so hard to say if the metric worked or not. |
Hey @kevsecurity did you observe this metric to work? |
Not looked at it! Maybe @lambdanis knows about it? |
@pratiklotia @sp3nx0r Did you observe rate limiting itself working as expected? I'm not sure if this metric worked correctly before, I'll check. |
@lambdanis When we add ratelimit actions to our TracingPolicies, we see the number of exported events go down. It goes further down when the ratelimit window is wider (say |
I see, the gotcha here is that Tetragon provides two different rate limiting mechanisms. You're using rate limits embedded in TracingPolicy, but this is not what There is no metric reporting events rate limited by a policy at the moment I'm afraid. It definitely would be useful, so feel free to open an issue to add it, or a PR. |
What happened?
We upgraded tetragon to v1.3.0.
The
tetragon_export_ratelimit_events_dropped_total
(which got renamed fromtetragon_ratelimit_dropped_total
in this release) for tracking number of rate-limited events does not seem to work as expected. The metric exists but results in 0 value.Log output included below.
Fix: I haven't looked into the codebase yet but I suspect some part of code got missed while renaming the metric.
Impact: We don't have visibility into how well our rate-limit filters are working and that makes it hard to tune the policy correctly.
Tetragon Version
v1.3.0
Kernel Version
6.8.0-1016-aws
Kubernetes Version
Server version 1.30
Bugtool
No response
Relevant log output
Anything else?
No response
The text was updated successfully, but these errors were encountered: