-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Better handle parallelism in cache priming #19721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the current setup doing things in phases has the problem that we only parallelize up to the width of the crate graph at the current depth which tends to be less than the number of threads available.
So the idea of this PR is to interleave the import map and crate symbols computation such that we work on those already while potentially waiting on more defmap computations to "unlock".
Is that correct?
// The idea is that if we have a def map available to compute, we should do that first. | ||
// This is because def map is a dependency of both import map and symbols. So if we have | ||
// e.g. a def map and a symbols, if we compute the def map we can, after it completes, | ||
// compute the def maps of dependencies, the existing symbols and the symbols of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// compute the def maps of dependencies, the existing symbols and the symbols of the | |
// compute the def maps of dependents, the existing symbols and the symbols of the |
what is meant with existing symbols? The one we already have? Why list that given we already computed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the symbols we already could compute when we computed the def map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
crates_done: crate_def_maps_done, | ||
crates_total: crate_def_maps_total, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should be using the total sum numbers here and not just the def maps? Unsure, it might be a bit weird to see this be n / n in the status message without finishing yet (due to us working on symbol / import maps still).
Alternatively we could swap out the work_type
message once we are done with the crate def maps with something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem with this PR is that status report gets more complicated.
Yes, it is. |
To make best use of available cores, and don't waste time waiting for other tasks. See the comments in the code for explanation.
And make it parallel by default (and remove the `--parallel` flag) to mirror the IDE cache priming.
fac08dc
to
1c7a94f
Compare
@Veykril I addressed most of your comments, but I don't know what to do with the status report given that we can't know the total amount of work as it requires knowing how module each crate has, and we can know that only after the def map completes. I agree seeing n/n crates done and it still working is weird, and if you have some idea I'd love to hear. |
Let's just swap the |
It could be confusing if they see "Indexing n/n" but cache priming does not finish.
Done that. |
Make best use of all available CPU cores.
Closes #19711.
Also change the flags of the
prime-caches
CLI command, and make it parallel by default.This saves 0.305ms in the parallel
prime-caches
on omicron, although the benchmarks were very noisy.I ran into a Salsa panic while running parallel prime caches on buck2... Need to debug that, but it doesn't reproduce now.