-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide ability to configure "max-idle-timeout" for an elastic agent pod #54
Comments
This defaults to the time specified under |
This is causing us issues as we are having to over-provision nodes on our kubernetes cluster so that new jobs can be scheduled while old agents are sitting around waiting to time out taking up valuable CPU and memory allocations. We use an auto scaling kubernetes cluster and our agent pod definition contains requests for 1 CPU and 4g memory. For kubernetes to provide the resources requested for an agent pod, it is autoscaling up more nodes as the existing CPU and memory is allocated to agents waiting in the 10 min timeout period. |
Hi @matthewrj, We are looking into adding a notification at the end of a job run that will terminate the Kubernetes pod. While we scope that change out, you could look at adjusting the Agent auto-register timeout to a value that works better for your setup. I would look at the average run time of jobs and reduce this timeout to be close to that value. If your average run time of jobs is around 3-4 minutes, I would set this value to something like 5 minutes. That way the plugin doesn't keep agents around for too much longer after jobs finish. The fix for this would be to introduce a notification at the end of a job run, based on which the plugin can terminate the Kubernetes pod. |
Closing this as it seems to implementation changed subsequent to this in EAv3, and elastic agent pods are single shot. There is a proposed implementation to re-enable agent re-use with idle semantics at #355 upon which this will be relevant to consider. |
There should be a configuration option while configuring an elastic agent profile to control "max-idle-timeout" for an agent pod. Currently, this defaults to 10 mins. This would help in
Not have multiple agents show up when running multiple jobs and clutter the UI.
Idle agents tend to confuse users, in case they are around to pick new jobs. While the current behavior is one agent per job.
The text was updated successfully, but these errors were encountered: