-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad does not report property about not started system jobs #25058
Comments
@EugenKon if certain clients don't match the constraints of a system job, then there's no way for Nomad to know you meant to have a given system job on that client (otherwise it would have placed it!). If the nodes match the constraints but there's not enough room to make a placement, then the evaluations list for that job will show a blocked eval. |
@tgross Hi. But for this job there is no constraints. And if jobs was not placed because of some constraint the Nomad usually reports that some clients were filtered. |
@EugenKon the very next half of that sentence says "that meets the constraints". I'm not sure why you're posting giant screenshots of text when it just says what I just told you.
Jobs almost always have implicit constraints. For example, maybe the 12th node doesn't have a healthy docker driver. Maybe it's in the wrong node pool. Maybe it's missing required CNI plugins. Or maybe it's just full. I don't know because you haven't provided any information about the node or the evals. Post the text (please not a screenshot, they're extremely hard for me to read) of But also, the last screenshot you posted shows 12 allocations, not 11. Isn't 12 what you want? |
@tgross
In this particular case service job took whole memory thus system job has no room to run. This job does not have any explicit constraints, thus I suppose it should be run on all available clients. If it does not I expect to see error message. Imagine the situation. If we have 10 Nomad clients and 5 jobs. We run them and all of them failed because of explicit or implicit (like you mention 'wrong pool', CNI plugin) constraint. And then Nomad UI shows nothing like in my example. This feels weird: we have a lot of clients, we have jobs, but we see none error message. Here I expect to see error that 1 system jobs failed and one node exhausted memory.
|
So in other words, there was nothing wrong with the decisions the scheduler made, but it was hitting the behavior I've described to you already in #25038 and #25061. You've already been told how to solve this problem, and you're not responding to the specific pieces of information I asked for even if you hadn't been, so there's nothing left to do here. |
Thanks a lot for the new commands. I'll try to use them instead of screenshots where possible. Unfortunately I can not provide output. We are in an active development and we moved already to different configuration. Thus theirs output will be useless here =(. |
Nomad version
1.8.2
Operating system and Environment details
Ubuntu 24.04
Issue
But we have 12 clients:
If I ssh to that EC2 instance and run
docker ps
I can see thatautoscaler
is not among jobs.Reproduction steps
Deploy cluster.
Create new EC2 instance at the cluster
Expected Result
Nomad should report that some EC2 instances are without system jobs.
Actual Result
No any issues are reported by Nomad
Job file (if appropriate)
*It would be nice to have
\<details\>
tag at the issue template.The text was updated successfully, but these errors were encountered: