You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed that completed jobs are not being cleaned up. We currently have the job-ttl argument set to 5m in our configuration. I believe the configuration sets the .spec.ttlSecondsAfterFinished value on the job. It looks like this was introduced in Kubernetes 1.23. Unfortunately, we are not able to update K8S as rapidly. As a result, pods continue to pile up in our cluster requiring us to either create a cron to clean them up or to manually delete them.
An approach I've seen from Github Actions Kubernetes Runners is to have the controller watch for completed Jobs and clean them up manually.
I've attached an image from one of namespaces running the agents showing the pods continuing to exist past 5 minutes.
The text was updated successfully, but these errors were encountered:
I think we will need to build a job cleanup function - aside from older k8s versions, there are other ways jobs can accumulate (e.g. they create successfully but can fail to start a pod for some reason, and sit around retrying forever).
We noticed that completed jobs are not being cleaned up. We currently have the
job-ttl
argument set to5m
in our configuration. I believe the configuration sets the.spec.ttlSecondsAfterFinished
value on the job. It looks like this was introduced in Kubernetes 1.23. Unfortunately, we are not able to update K8S as rapidly. As a result, pods continue to pile up in our cluster requiring us to either create a cron to clean them up or to manually delete them.An approach I've seen from Github Actions Kubernetes Runners is to have the controller watch for completed Jobs and clean them up manually.
I've attached an image from one of namespaces running the agents showing the pods continuing to exist past 5 minutes.
The text was updated successfully, but these errors were encountered: