-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Remove fewer Storage calls in copy_prop
#142531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Remove fewer Storage calls in copy_prop
#142531
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…try> Remove fewer Storage calls in `copy_prop` Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649. ### Details This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)). The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access). When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them. This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct. r? tmiasko since he commented on the issue and pointed to these passes.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ef7d206): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.7%, secondary 3.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -0.6%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 757.399s -> 756.065s (-0.18%) |
@matthiaskrgr - I updated the impl to stop re-checking once a head is found to be maybe-dead, which should be a bit better |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…try> Remove fewer Storage calls in `copy_prop` Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649. ### Details This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)). The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access). When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them. This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct. r? tmiasko since he commented on the issue and pointed to these passes.
Should this check happen in |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
I'm not sure how to make this work: using Is there a different way to do this? |
Finished benchmarking commit (c0a2949): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.1%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 756.494s -> 757.685s (0.16%) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2c919c0
to
48b0529
Compare
This comment has been minimized.
This comment has been minimized.
f282ae6
to
dcb58d1
Compare
This comment has been minimized.
This comment has been minimized.
dcb58d1
to
ad0ab67
Compare
This comment has been minimized.
This comment has been minimized.
ad0ab67
to
aa11a50
Compare
This comment has been minimized.
This comment has been minimized.
// StorageDead makes a local uninitialized. | ||
mir::StatementKind::StorageDead(local) => { | ||
state.insert(local); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also include StorageLive
. A variant of the earlier example:
#![feature(custom_mir, core_intrinsics)]
use std::intrinsics::mir::*;
#[custom_mir(dialect = "runtime")]
pub fn live_twice<T: Copy>(_1: T) -> T {
mir! {
let _2: T;
let _3: T;
{
StorageLive(_2);
Call(_2 = opaque(Move(_1)), ReturnTo(bb1), UnwindUnreachable())
}
bb1 = {
let _3 = Move(_2);
StorageLive(_2);
Call(RET = opaque(_3), ReturnTo(bb2), UnwindUnreachable())
}
bb2 = {
StorageDead(_2);
Return()
}
}
}
#[inline(never)]
fn opaque<T>(a: T) -> T {
a
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the StorageLive check anyway, but this won't compile since
broken MIR in Item(...) at bb1[0]:
StorageLive(_2) which already has storage here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MIR is valid, although we wouldn't like to generate such a code in the first place. mir-opt tests enable an additional lint that detects suspicious MIR code patterns.. You can disable it with -Zlint-mir=false
.
// We need to determine if we can keep the head's storage statements (which enables better optimizations). | ||
// For every local's usage location, if the head is maybe-uninitialized, we'll need to remove it's storage statements. | ||
head_storage_to_check.insert(head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this approach requires local not to be borrowed, since otherwise we cannot easily identify when it is used.
Please rebase past #142571, when it lands, and double check that we actually enforce this. I suspect this pull request also fixes failure from x86_64-gnu-tools
cc @cjgillot.
// Debug builds have no use for the storage statements, so avoid extra work. | ||
let storage_to_remove = if tcx.sess.emit_lifetime_markers() { | ||
let maybe_uninit = MaybeUninitializedLocals::new() | ||
.iterate_to_fixpoint(tcx, body, Some("mir_opt::copy_prop")) | ||
.into_results_cursor(body); | ||
|
||
let mut storage_checker = | ||
StorageChecker { maybe_uninit, head_storage_to_check, storage_to_remove }; | ||
|
||
storage_checker.visit_body(body); | ||
|
||
storage_checker.storage_to_remove | ||
} else { | ||
// Conservatively remove all storage statements for the head locals. | ||
head_storage_to_check | ||
}; | ||
StorageRemover { tcx, storage_to_remove }.visit_body_preserves_cfg(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to organize it so thatStorageChecker
runs before Replacer
and all changes are performed all at once inside Replacer
?
I think it is quite confusing when changes are applied in a piecewise manner, first in Replaced
and then StorageRemover
, since it is not immediately clear what is and what isn't replaced already.
@@ -0,0 +1,38 @@ | |||
// skip-filecheck | |||
// EMIT_MIR_FOR_EACH_PANIC_STRATEGY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EMIT_MIR_FOR_EACH_PANIC_STRATEGY
is used when output differs between panic strategies. I don't think we need it in this test. Please remove it.
… to remove fewer storage statements
aa11a50
to
365edc7
Compare
Modify the
copy_prop
MIR optimization pass to remove fewerStorage{Live,Dead}
calls, allowing for better optimizations by LLVM - see #141649.Details
This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in this branch).
The idea is to use
thea newMaybeStorageDead
MaybeUninitializedLocals
analysis and remove only the storage calls ofhead
s that are maybe-uninit when the associatedlocal
is accessed (or, conversely, keep the storage ofhead
s that are for-sure initialized in every relevant access).When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in
rav1d
(where I originally saw the issue) and observed reduced stack usage in several of them.This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct.
r? tmiasko
since he commented on the issue and pointed to these passes.