You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if we ever get into a scenario where gantry is going completely haywire or is not resolving an OOM, it would be nice to have a mechanism to disable dynamic allocations for specific packages or the entire system. This shouldn’t require going into the container or having any knowledge of how the program works.
What would success / a fix look like?
A simple web page behind gh auth accessible to a small group that has a form to disable dynamic allocations on a package level. For subsequent predictions, it should return no cpu limits and 64GB memory limit.
The text was updated successfully, but these errors were encountered:
@kwryankrattiger wondering if you think this is necessary given the feature you wrote into spack/spack#41622 to disable emergency allocation using env variables. The motivation here is to pausing gantry for specific packages, rather than disabling the entire system.
I think this might make sense eventually, but to start I am guessing we will only want to completely disable it using the variable.
Managing and fine tuning something like this may even make sense as part of the dynamic mapping config. I wouldn't want to add additional services unless we needed it for some reason.
Problem/Opportunity Statement
if we ever get into a scenario where gantry is going completely haywire or is not resolving an OOM, it would be nice to have a mechanism to disable dynamic allocations for specific packages or the entire system. This shouldn’t require going into the container or having any knowledge of how the program works.
What would success / a fix look like?
A simple web page behind gh auth accessible to a small group that has a form to disable dynamic allocations on a package level. For subsequent predictions, it should return no cpu limits and 64GB memory limit.
The text was updated successfully, but these errors were encountered: