You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the pilot phase, we will only be implementing predictions for requests, and ensuring that they will only increase compared to current allocations.
If we see success in the pilot, we'll implement functionality which retries jobs with higher memory allocations if they've been shown to fail due to OOM kills.
Then, we will "drop the floor" and allow the predictor to allocate less memory than the package is used to. At this step, requests will be fully implemented.
Limits for CPU and memory will be implemented.
Next, we want to introduce some experimentation in the system and perform a scaling study.
Design a scheduler that decides which instance type a job should be placed on based on cost and expected usage and runtime.
Evaluation
The success of this framework can be evaluated against a number of factors:
Has the cost per job changed?
Are jobs being killed due to resource contention?
What is the error distribution of our predictions?
How much waste is there per build type?
The text was updated successfully, but these errors were encountered:
This is a tracking issue used to document the current set of features we would like to integrate into gantry.
This thread should also be used to discuss new directions for the project.
Plan
Evaluation
The success of this framework can be evaluated against a number of factors:
The text was updated successfully, but these errors were encountered: