Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource fuzzing to assess job performance variation #76

Open
Tracked by #71
cmelone opened this issue Jul 31, 2024 · 2 comments
Open
Tracked by #71

Resource fuzzing to assess job performance variation #76

cmelone opened this issue Jul 31, 2024 · 2 comments
Assignees
Labels
feature New feature or request

Comments

@cmelone
Copy link
Collaborator

cmelone commented Jul 31, 2024

Given the goal of reducing costs on a per-job basis, we would like to understand the effects of limiting CPU cycles available to a build job. This process would add variance to the resource allocation algorithm.

This would take "gantry in the direction of a full genetic algorithm to optimize the resource requests of jobs to build applications in the least expensive way possible" - Alec.

This is essentially a scaling study in order to balance the amount of cycles allocated to a build and the wall time of the job, ultimately optimizing cost. The efficiency curve is the plot of interest, where efficiency is defined as cores/build time.

This would be done by choosing 10-15% of all incoming prediction requests to "fuzz" and purposefully limiting the CPU resources allocated so we can understand the impact on different types of applications and the variety of build options available in Spack.

This fuzzing would occur a few times for each given spec, until we can determine the optimal efficiency for the job, which would be used to define future CPU limits and the number of make jobs.

@cmelone cmelone added the feature New feature or request label Jul 31, 2024
@cmelone cmelone self-assigned this Jul 31, 2024
@cmelone cmelone mentioned this issue Jul 31, 2024
5 tasks
@cmelone
Copy link
Collaborator Author

cmelone commented Sep 17, 2024

fuzzing:

  • when fuzzing, include an indicator variable and mark job as having been fuzzed so it doesn't get used as a predictor in the future
  • make sure you don't fuzz a retried job by checking if the last exact spec in the db failed
    • would some sort of grace period be necessary? "not fuzzable until x time"

storing predictions:

  • store a log of predictions in the db -- would be fuzzy searched just like the current system
  • predictions would have a bool fuzzable indicator
  • can invalidate them so they don't get used again
  • not completely sure if this will increase efficiency...most of the computation is done by the search and not computing the actual allocations

@cmelone
Copy link
Collaborator Author

cmelone commented Oct 29, 2024

once we have fuzzed for a bit, need to figure out how to update the prediction algorithm to choose allocations based on the efficiency of resources/duration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant