Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminated container(s) doesn't kill the pod #505

Open
Otterian opened this issue Nov 4, 2022 · 1 comment
Open

Terminated container(s) doesn't kill the pod #505

Otterian opened this issue Nov 4, 2022 · 1 comment

Comments

@Otterian
Copy link

Otterian commented Nov 4, 2022

In certain circumstances the connection to GitHub might fail. This can be due to SSL/TLS issues, GitHub being down, etc.

Example from the runner container log:

√ Runner successfully added
The SSL connection could not be established, see inner exception.
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.

This leads to the runner container being terminated, but the pod itself keeps running (albeit in an ERROR state) - blocking spawning of new pods for the affected pool. After deleting the pod, the pool scales as normal again.

This could be solved by using a livenessProbe on the runner container to check if it is running or not, but if any of the containers under the pod terminates, the pod should also be terminated (handled in the Operator)

@kasey-weirich
Copy link

The only way I've been able to get around this when GH has hiccups and our runner pool crashes is to either scale the operator to 0 then back to 1, or force delete the pool then let the operator scale them back up.

Would love to have a better solution to this, as GH outages seem to be 3-4 times a year if not more.

livenessProbe sounds interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants