-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add limits for container resources like cpu. memory, GPU in addition to requests #3168
Comments
@lresende what do you think? Do you know why historically, only requests, but not limits were added? |
Hi @shalberd. First, I apologize for the delay in responding (new employer and different application). I happen to be looking at this very topic for one of my other projects and have reached the conclusion that CPU limits should either not be set (assuming the container can be trusted to not go crazy with resources), or set to some value so as to prevent such a rogue action from consuming all CPU resources, but absolutely not the same as the request size. Memory request/limits, on the other hand, is different and it's probably fine to equate the two (so I would argue it could be set w/o UI/metadata changes). This all has to do with the fact that CPU is a "compressible resource" while memory is not. Here are a couple articles I found really helpful: I think GPU limits are difference because, IIRC, there isn't a way to "request" GPU. To introduce CPU limits, we'd either have to introduce a new value in the metadata (and forms) or come up with some kind of factor value (e.g., limit = 2x request) that can be set via an ENV or also exposed (although we should just expose the limit if this is appearing in the UI). |
Hi @kevin-bates Two new values / fields in forms / GUI would be the approach I suggest, for CPU and memory limits. What the, overridable, filled-in defaults (if even possible with the GUI technology here) are, can be discussed further, you've got good input. But for now, I'd even be happy just being able to set limits myself, at user discretion, non-required. @lresende does Elyra use Django for its GUI framework? Or Bottle or ... I see jinja templates, but I cannot make heads or tails out of the Web Framework technology used. Maybe indirectly through some Jupyter extension mechanism. |
Hello @kevin-bates and @shalberd I am also being affected by this issue. My kubernetes cluster has argo workflow resource defaults, which means that it will populate both request and limits if they are not specified. In the case of my Elyra pipeline, I am only able to specify the compute requests, which means that the argo controller will populate the limit defaults and the pipeline automatically fails because the requests surpasses those limits. Having the option to specify the limits (or at least have the both equate) would allows to trigger the pipelines as needed. Thanks a lot for any help! |
Hi @paloma-rebuelta. It is very disappointing to hear that Argo's resource limit handling is implemented that way! They should (at a minimum) provide an option to use the request value as the limit, rather than a blanket fixed limit. It is commonly recommended to NOT set limits, particularly for CPU, so that a node's resources can be fully utilized. (Memory is a bit of a different story.) Given it appears this PR is hung up on the UI side of things, here's a suggestion that I believe could be easily implemented AND be both backward (and forward once the UI is in place) compatible.
Once the UI portion of the changes have been done, we should fallback on the logic above to determine the limit values when the UI fields are empty. As a result, the logic above essentially becomes the default value for the limits, which could include NOT setting a limit when both the UI field and the envs are not set (or set to values < 1.0). Would you be willing to contribute such a PR? |
@kevin-bates I agree on your stepwise solution proposed here. If you want to, I can take a crack at it next year, let's say starting January. @romeokienzler fyi @paloma-rebuelta if you make a PR before me, that would be ok, but I will, if ok, start with the env var approach and PR in January. Also, since I have recently added GUI value fields myself and know now how those are handled and added I might even combine in the PR adding the new GUI fields, the env vars, and the overall new logic, as @kevin-bates described. |
Thanks a lot @shalberd looking forward |
I'll only get to it by January, am traveling now and on holidays. Maybe the others can have a closer look, thank you very much for the work and effort. |
Is your feature request related to a problem? Please describe.
@kevin-bates Currently, in the Elyra GUI, i.e. for generic pipelines and a component therein, you can set values for CPU, Memory, and GPU and GPU Vendor.
https://elyra.readthedocs.io/en/stable/user_guide/pipelines.html#resources-cpu-gpu-and-ram
Those values are set as container request sizes, except GPU, which is set as GPU limit and used as GPU limit, as far as I see. That goes against the doc that speaks of all as "resources required".
Resources: CPU, GPU, and RAM
The problem is that not setting any limits, and not giving the user any way to set limits, leads to cluster instability potentially on Kubernetes and Openshift.
Describe the solution you'd like
Add fields and properties for CPU, GPU, Memory limits in GUI, to be used later in processors and templates.
Possibly also add default factor of x1, overwritable of course, for the limits field values, based on requests fields.
Would have to add those new fields / properties to the generic properties template as well.
Describe alternatives you've considered
No real way around it if using the graphical pipeline editors with setting for runtime containers.
Additional context
Came across this when discussing Kubernetes / Openshift Resource Limits setting in an Airflow2 compatible way. Discussion quickly came to, wait a sec, no limits set ...
#3167 (comment)
The text was updated successfully, but these errors were encountered: