Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job watcher #442

Merged
merged 6 commits into from
Dec 5, 2024
Merged

Add job watcher #442

merged 6 commits into from
Dec 5, 2024

Conversation

DrJosh9000
Copy link
Contributor

What

Add a component that looks for k8s jobs in two troublesome states:

  • Jobs that finish without ever creating a pod
  • Jobs that stall for too long without creating a pod

When such a job is detected, it grabs k8s events for the job and puts them in a failure message to fail the corresponding BK job.

Why

May fix #302, as well as configuration issues that don't block job creation but prevent running the pod (e.g. missing user, invalid labels, ...)

@DrJosh9000 DrJosh9000 force-pushed the add-job-watcher branch 6 times, most recently from 1caa1a2 to 5a2e46b Compare December 3, 2024 22:50
@DrJosh9000 DrJosh9000 marked this pull request as draft December 4, 2024 00:02
@DrJosh9000 DrJosh9000 force-pushed the add-job-watcher branch 8 times, most recently from c6211c3 to 04cdd88 Compare December 5, 2024 04:12
@DrJosh9000 DrJosh9000 marked this pull request as ready for review December 5, 2024 04:21
@DrJosh9000 DrJosh9000 force-pushed the add-job-watcher branch 4 times, most recently from 2ba4df1 to 002c360 Compare December 5, 2024 05:06
@DrJosh9000 DrJosh9000 force-pushed the add-job-watcher branch 3 times, most recently from 9062a82 to d3e7cbc Compare December 5, 2024 05:59
Copy link
Contributor

@CerealBoy CerealBoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@DrJosh9000 DrJosh9000 force-pushed the add-job-watcher branch 5 times, most recently from a303de0 to 944749a Compare December 5, 2024 23:03
@DrJosh9000 DrJosh9000 merged commit c85d911 into main Dec 5, 2024
1 check passed
@DrJosh9000 DrJosh9000 deleted the add-job-watcher branch December 5, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Controller stops accepting jobs from the cluster queue
2 participants