-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on Interceptor #1078
Comments
@JorTurFer / @wozniakjan do you have bandwidth to check this one? |
thank you for reporting this, from the graphs this indeed looks like a memory leak, I can investigate later today. |
after a quick investigation, this might actually be related to HTTP connections overhead, GC and request caching rather than an actual memory leak. I am able to reproduce quick memory growth with concurrent requests. When idle on a tiny cluster with a single
with requests sent sequentially, the memory usage barely increases and then goes down to idle levels. With simple benchmark using
the memory usage does increase, peaking at
but after Attached is also trace output from |
I may have noticed something, with higher error rate in the benchmark, the memory consumption doesn't go down. I just bumped the memory limit for interceptor to 1Gi and executed benchmark that may have been too much, 5000 connections in 200 threads for 10 minutes
out of ~350000 requests, 91% timeouts and 4.7% errors. The memory usage is still at |
@wozniakjan my current number is around 15k requests every minute. I currently have 3 interceptor pods, at 640 MB each... in about one hour and half that number is expected to peak at 25k... during the late night, between 2am-5am, the number drops to 5k... |
@Leonardo-Ferreira I played around with benchmarks a bit more and afaict With a low error rate, the memory usage appears to go down to idle levels eventually, but when the error rate is high, even when later the Go's pprof heap analysis doesn't point to any code that would accumulate memory and not release it, but also heap as counted by pprof appears to be just a fraction of memory used compared to cgroup accounting. The pprof recorded 44MB while cgroup show 318MB used. |
@wozniakjan after looking deeply at my data, I could not correlate the memory usage/increase-rate to the number of errors. Luckily (or not) we had an issue this morning where a dependency was flaky so there was a significant spike in errors related to timeouts. on the memory consumption graph there was no significant change during the 1 hour long event... |
good to hear that, perhaps my artificial benchmark setup just ends up being CPU-intensive. my current hypothesis I am trying to either prove or disprove is that maybe when the |
would that output a specific log message that I could query for? |
I can see plenty of these access logs in the
despite the fact the scaled app returns always Also, I can observe these errors in the
|
should we log some extra info and release a 0.8.1 version? unfortunately my security dpto does not allow me to use "unofficial" images in production |
I'm still in the process of trying to figure out the root cause but as soon as I have some valid hypothesis to test, I can distribute a build for your testing |
any news @wozniakjan, can I help somehow? |
I didn't get very far and now I had to put it on hold temporarily, but I will get back to this soon |
hey @wozniakjan, I'd like to contribute here. would you be willing to connect for like 30-45min so you can "boost" me? this way I could make significant contributions faster |
sure, you can ping me on KEDA slack https://keda.sh/community/ |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
This issue has been automatically closed due to inactivity. |
@zroubalik reopen? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Report
I think we have a memory leak.
Here is the memory usage over 2 days:
The memory drops is the pod getting OOM killed. At first we thought 64mb is too little, so we decided to experiment higher values in the last 24hrs, but the symptom persisted:
Expected Behavior
the pods should release memory and as traffic fluctuates down and not get OOM killed
Actual Behavior
Memory usage never decreases
Steps to Reproduce the Problem
Logs from KEDA HTTP operator
HTTP Add-on Version
0.8.0
Kubernetes Version
< 1.28
Platform
Microsoft Azure
Anything else?
AKS v 1.27.9
The text was updated successfully, but these errors were encountered: