-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull consumer with max bytes setting causes high CPU usage [v2.10.18] #1718
Comments
Also reproduced with NATS 2.10.19 and nats.go 1.37.0. |
Also to add: It might be a client issue, but I opened it here because it's unclear. |
to clarify the high cpu is on the app or the nats-server? |
Both. What appears to be happening is that the pull is retried repeatedly, which increases the CPU both on the client and server sides. |
With what frequency? Do you have any delay between retries? Maybe a backoff? |
This is a single call to |
thanks for sharing the repro @atombender we'll take a look |
Thanks for the repro, could hear from my laptop's fan speed that it's working 😅 Seems the server is spammed with
The client should at least log the error being hit, and should wait in-between retries probably. But possibly this condition will never clear due to the message just staying, making the consumer stall. What should we do in this case/what is the intended behaviour? |
Possibly the server could also protect itself more by not immediately sending the error back, which would also slow down the client. |
The documentation is not clear about whether the max bytes limit is a hard or soft limit. It is apparent from the observed behavior that it is a hard limit that will prevent the consumer from consuming the next message for ever. In other words, it's possible to write a consumer that just stops being able to consume (until the offending message is deleted/expired). That condition should at least be detectable at the consumer level. It's debatable whether the client should retry, given that in a JetStream context it can't expect the next call to work — it's going to be stuck retrying until the blocking message is gone. |
IMHO this should be moved to a nats.go issue rather than nats-server, the problem being with the How it should behave instead is open for discussion but I would say it should not retry and should instead signal the client app (which for |
Agreed. |
We're having some ongoing discussions how to generally improve the |
Observed behavior
In production, we noticed that NATS would periodically spike in CPU usage despite no sign of increased message volume or any other metric that seemed relevant.
We were able to narrow it down to setting
jetstream.PullMaxBytes()
with the pull consumer. If a message entered the stream that exceeded this size, the client would get a409 Message Size Exceeds MaxBytes
error from the server and apparently retry. Removing the max bytes limit fixed our issue.Also note that the error does not bubble up to the consumer's error handler callback. We were using
jetstream.ConsumeErrHandler()
to log errors. However, after a while this callback is called with the errornats: no heartbeat received
.Expected behavior
NATS should not use this much CPU.
Server and client version
Host environment
Linux, Kubernetes.
Steps to reproduce
Full reproduction here.
jetstream.PullMaxBytes()
passed toconsumer.Consume()
using a low maximum size.The text was updated successfully, but these errors were encountered: