Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

adityasood · 2018-05-24T10:04:30Z

There should be a configuration option while configuring an elastic agent profile to control "max-idle-timeout" for an agent pod. Currently, this defaults to 10 mins. This would help in

Not have multiple agents show up when running multiple jobs and clutter the UI.
Idle agents tend to confuse users, in case they are around to pick new jobs. While the current behavior is one agent per job.

varshavaradarajan · 2018-06-13T03:47:40Z

This defaults to the time specified under Agent auto-register timeout. Either we provide the max idle timeout property or since EA v3, since agents aren't reused, we always terminate the agent after say, 1 minute. That is, it need not be a property for users to configure as the agents aren't of any use after the job.

matthewrj · 2018-08-06T20:46:04Z

This is causing us issues as we are having to over-provision nodes on our kubernetes cluster so that new jobs can be scheduled while old agents are sitting around waiting to time out taking up valuable CPU and memory allocations.

We use an auto scaling kubernetes cluster and our agent pod definition contains requests for 1 CPU and 4g memory. For kubernetes to provide the resources requested for an agent pod, it is autoscaling up more nodes as the existing CPU and memory is allocated to agents waiting in the 10 min timeout period.

sheroy · 2018-08-07T04:09:48Z

Hi @matthewrj,

We are looking into adding a notification at the end of a job run that will terminate the Kubernetes pod. While we scope that change out, you could look at adjusting the Agent auto-register timeout to a value that works better for your setup. I would look at the average run time of jobs and reduce this timeout to be close to that value.

If your average run time of jobs is around 3-4 minutes, I would set this value to something like 5 minutes. That way the plugin doesn't keep agents around for too much longer after jobs finish.

The fix for this would be to introduce a notification at the end of a job run, based on which the plugin can terminate the Kubernetes pod.

chadlwilson · 2023-11-30T06:46:55Z

Closing this as it seems to implementation changed subsequent to this in EAv3, and elastic agent pods are single shot. There is a proposed implementation to re-enable agent re-use with idle semantics at #355 upon which this will be relevant to consider.

chadlwilson closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

adityasood commented May 24, 2018

varshavaradarajan commented Jun 13, 2018

matthewrj commented Aug 6, 2018

sheroy commented Aug 7, 2018

chadlwilson commented Nov 30, 2023

Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

Provide ability to configure "max-idle-timeout" for an elastic agent pod #54

Comments

adityasood commented May 24, 2018

varshavaradarajan commented Jun 13, 2018

matthewrj commented Aug 6, 2018

sheroy commented Aug 7, 2018

chadlwilson commented Nov 30, 2023