-
Notifications
You must be signed in to change notification settings - Fork 956
improve rustc wrapper startup time? #2626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@matthiaskrgr this is about the rustup wrapper, not the x.py rustc shim, right? If so I think this should be moved to https://github.com/rust-lang/rustup/. |
Ah, ~/.cargo/bin/rustc is rustup? Yes, the issue should be moved then. :) |
I can confirm that the difference in performance is around 6x:
|
Looking locally, ca. 15ms are before main() is entered (presumably due to dynamic link time for all the libraries rustup uses by dint of being linked to curl). It looks like about 25ms is taken parsing manifest files, sadly this is also done twice, and thus accounts for the vast majority of the extra time. The toml parser must be quite inefficient. It might be possible to knock one of those parse steps on the head since in theory we only need to parse once, but it'll involve quite a bit of refactoring. If someone wants to begin to look at this, then the place to start is the config.rs file, around line 620 which is |
To help triage this, how much does this contribute to typical build times? like, - sure we can put a chunk of time into making this snappy, and that would be good, but if a typical build is 1500ms ... ? |
@rbtcollins I wouldn't notice so much on builds, but this hurts things like
|
to addd to the list:
|
Comparing with C compilers is disingenuous since the number of rustc invocations for a similarly sized codebase will be many fewer. For example, rustup is around 130 .rs files, but including its dependencies the total rustc invocations is only 320 or so. If we assume a wasted 35ms (the repeated manifest parse) of CPU time on my computer per rustc invocation and we generously assume that all that waste is reflected in the wallclock compile time at the same parallelism that the build occurs at, then of the 83s it took my computer to build rustup just now at 6.1x parallel, then that's 11375ms wasted which at 6.1x parallel is 1.864s of wallclock time or 2.2% of all build time in a debug build. The same maths on a release build, 11375ms wasted for a total build time of 195s is only 0.95% of the total runtime of the build. Considering the however Having said all that, I do think it's worth correcting the double-load of the manifest since it's wasteful to do that anyway, and that shouldn't be too complex to do in a grungy way; or only medium hard to do nicely. What would be of even more use though is working out how it takes 35ms on a 4GHz CPU core to parse the manifest, that's utterly abysmal. |
Further investigation yields: The majority of the time spent parsing the manifest is in the toml load itself, i.e. before we apply any semantic understanding to the manifest. The channel is around 700k and will only get larger over time. Another possible improvement to do would be to trim out anything not relevant to the toolchain being installed when fetching the manifest. That ought not to be too complex to do and would speed up future operations since the manifest would be significantly smaller |
I wrote a simple trimmer for the manifests to reduce the toml quantity in installed toolchains. Doing that results in the following (stable is untrimmed, beta is trimmed):
(for reference, directly invoking rustc (stable or beta) without rustup in the way is ca. 11ms. |
@matthiaskrgr Assuming #2627 goes CI-green, you might want to try the binary from there and see how it affects performance for you. |
Well, I somewhat disagree: https://danluu.com/keyboard-latency/#humans-can-t-notice-100ms-or-200ms-latency. But the numbers after your change are a lot better :) |
Well, I made a small ICE-finder that runs rustc on all the files inside the rustc repo. Using |
Running it serially, individually? Not terribly surprising. I'm totally fine with rustup being faster, but development of rust itself isn't our primary use case: shipping rust toolchains to our users is, and running rustc in such a non idiomatic fashion is very much a specialised, developer of rust, need. To me that says, if someone wants to make this faster, great, but we're unlikely to steer folk to this as a priority issue, unlike eg the corruption issues, locking issues, etc. |
If folk are running error explains in their game update loops - the context of that blog post - we have a whole new set of requirements to feed into rustup design. Fast is good, and it is a feature. But language servers and dev environments can avoid making the user wait for the invocation of the compiler in various ways; and should simply because remote dev environments pay latency in many ways, so having latency hiding techniques in play is just good design. |
When I was playing with my dodgy PR above, I ran into the fact that |
Manifest parsing aside, would it potentially make sense for the top-level rustup-enabled cargo invocation to determine the toolchain and then set the |
I just tested this approach building I think that'd be a worthwhile optimization. |
Is setting One concern with this, rather than simply improving the startup performance of rustup, is how this might interact with how rustup sets up fallbacks for incomplete linked toolchains (i.e. as used in cargo/rustc development flows). If we can be sure that the fallback approach won't be broken by this, or that we can somehow detect when we'd invoke fallback case and not set the variables in that context, then it ought to be plausible. |
I think it'd be reasonable to always set
Right. See https://doc.rust-lang.org/cargo/reference/environment-variables.html ; cargo will invoke
I think we can handle that case by just not setting |
Is it worth reconsidering rustup as a chimera binary, given just how much stuff gets linked that isn't relevant to Alternatively, if a chimera is still desirable, splitting most of rustup into a dynamic library that's only loaded after we've checked if we're a proxy binary. |
Yes, worth considering. I have a project (#2441) to make proxies safe on windows as well which is related; that said, how much warm startup-time impact does linking those libraries have today? Cold time is irrelevant, since a single process invocation is trivially amortised over a full build. There is some complexity consider in having a separate binary though such as distribution and updating. I certainly don't have time to engage with it at the moment, though if someone else wanted to take it on I do have a few ideas related to #2441 that would be worth considering. |
IME having profiled it a bit, the majority of the startup time of the proxies, by a LONG margin, is the parsing of the channel toml. Hence I played with #2627 though it's the wrong solution, it's an approach to consider. |
@kinnison it may be different on Windows |
@konstin Yes, please do file a ticket for that in rust-lang/rust. And please give the details about bypassing the wrapper and invoking rustc directly, so that anyone trying to reproduce it can easily do so without getting sidetracked by the wrapper. |
rust-lang/rust issue: rust-lang/rust#121631 |
I did some profiling and tokio-rs/tracing#3174 and alexcrichton/curl-rust#601 combined seem to be responsible for about 30% of the runtime. |
@bjorn3 thanks, very interesting! For the curl initialization, have you looked at where/how that work is triggered? Especially in the default configuration where we're not using curl at all it seems surprising that curl is still doing initialization work. |
The curl crate has a global constructor which runs curl's initialization function. This in turn initializes OpenSSL. https://github.com/alexcrichton/curl-rust/blob/63f7213d7fd002f42f0e3124fc1da42655c0585a/src/lib.rs#L123-L151 |
For the tokio-rs/tracing#3174, it might be worth trying to use the |
I profiled this again after the recent improvements, and a majority of the time spent in rustup now is spent in manifest parsing ( Since this manifest is most likely auto-generated, I wonder if we could use a different (binary, more efficient) format for (de)serializing it? Or maybe rustup could load it as TOML, but reserialize it to disk in a binary format? It seems a bit wasteful to keep it as TOML, since it's being reparsed on every rustup invocation, even though it doesn't actually change unless the toolchain itself is updated (?). |
Just as an example, parsing the file as JSON takes ~2ms (the input size is 700 KiB, so not that much smaller), so it's 10x faster, and there are likely much more efficient formats than JSON (although I tried bincode and MessagePack and both struggled with the way the |
Thanks for looking into this! Do you have flamegraphs or other more detailed data? I would maybe look at postcard. What advanced serde trickery are you referring to? |
What do you mean by this? The rustup wrapper is by now supposed to only be called "once" per compilation session by cargo, so not for every invocation. |
I simply meant for every invocation of the rustup wrapper :) (e.g. for #2626 (comment))
(Recorded with |
So to ask the obvious first, do we need to load any metadata while in proxy mode? It's just to validate the components are installed right? Which feels more like the job of rustup proper so the proxies can just assume everything works. |
@ChrisDenton Wait, IIRC Lines 534 to 538 in ac249c8
@Kobzol This makes me wonder, what will the graph look like if the new behavior is used instead ( |
Ah right, that's a pain. So yeah persuading people to disable auto-install is the best call. 😉 My next thought would be to encode the essential information in a super simple binary format that's fast to load into memory. |
Can we encode just the list of installed components in a binary file and only validate this file when running in proxy mode? And then leave any consistency checks that need to parse the full manifest to directly running rustup or when the proxy detects that a component is missing. |
With that env flag, |
@Kobzol Thanks! So setting It is also worth mentioning that with auto installation enabled, |
Or |
Are there plans to flip the default in the future, or is it expected that the auto-install behavior will stay the default? If it is the latter, I think that it could still be worth it to optimize the manifest loading. |
I personally would like to move towards it being the default but we tried that in 1.28.0 and it was not well received. So it'll likely we'll support at least some (perhaps more limited) form of auto-install for awhile yet. So, speaking for myself, I think it may be worth optimising the happy path if someone wants to work on that. But other team members may have other opinions. |
I would say there are plans to flip the default, but I think none of us are currently very motivated to push the issue forward. |
I tried to switch the loading to JSON, and with that running |
time ~/.rustup/toolchains/master/bin/rustc >& /dev/null
0,040 total
time ~/.cargo/bin/rustc >& /dev/null
0,272 total
That's almost 7x slower.
I wonder if there is some something we can do to get these numbers a bit closer?
The text was updated successfully, but these errors were encountered: