-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify slice::Iter::next
enough that it inlines
#136771
base: master
Are you sure you want to change the base?
Conversation
Let's see whether it actually improves things: |
This comment has been minimized.
This comment has been minimized.
Simplify `slice::Iter::next` enough that it inlines Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990> Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (30df00c): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.2%, secondary -2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.0%, secondary 1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 780.488s -> 778.559s (-0.25%) |
f7970b3
to
85deb4d
Compare
@@ -14,11 +14,11 @@ | |||
// CHECK-LABEL: @slice_iter_next( | |||
#[no_mangle] | |||
pub fn slice_iter_next<'a>(it: &mut std::slice::Iter<'a, u32>) -> Option<&'a u32> { | |||
// CHECK: %[[ENDP:.+]] = getelementptr inbounds{{( nuw)?}} i8, ptr %it, {{i32 4|i64 8}} | |||
// CHECK: %[[END:.+]] = load ptr, ptr %[[ENDP]] | |||
// CHECK: %[[START:.+]] = load ptr, ptr %it, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased atop the transmute-gives-asserts change; codegen tests should be passing now with just this trivial change that it's loading the start pointer first instead of the end pointer first.
@@ -4,28 +4,30 @@ fn enumerated_loop(_1: &[T], _2: impl Fn(usize, &T)) -> () { | |||
debug slice => _1; | |||
debug f => _2; | |||
let mut _0: (); | |||
let mut _11: std::slice::Iter<'_, T>; | |||
let mut _12: std::iter::Enumerate<std::slice::Iter<'_, T>>; | |||
let mut _13: std::iter::Enumerate<std::slice::Iter<'_, T>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see that the Enumerate
iterators get completely SRoAed even just in MIR, with this!
Just to check that having the assumes in the LLVM-IR doesn't somehow lose all of the gains: |
This comment has been minimized.
This comment has been minimized.
Simplify `slice::Iter::next` enough that it inlines Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990> Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (eaf73cd): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.8%, secondary -3.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.4%, secondary 2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary -0.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 785.339s -> 785.217s (-0.02%) |
@@ -3,12 +3,134 @@ | |||
fn slice_iter_next(_1: &mut std::slice::Iter<'_, T>) -> Option<&T> { | |||
debug it => _1; | |||
let mut _0: std::option::Option<&T>; | |||
scope 1 (inlined <std::slice::Iter<'_, T> as Iterator>::next) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to compare to nightly, https://godbolt.org/z/Mr6cW9PWc
85deb4d
to
4392415
Compare
Make a small tweak to the approach; let's see whether it was actually good: |
This comment has been minimized.
This comment has been minimized.
Simplify `slice::Iter::next` enough that it inlines Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990> Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (d786e76): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.7%, secondary 0.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.2%, secondary 1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.2%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 789.679s -> 787.912s (-0.22%) |
// safe since we check if the iterator is empty first. | ||
let ptr = self.ptr; | ||
let end_or_len = self.end_or_len; | ||
// SAFETY: Type invariants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety comment is a bit too sloppy imo. At least it should say something like "same as above" if you want to avoid repetition. Or maybe split it into two unsafe blocks, one for each arm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, true.
Weirdly when I added tighter-scoped unsafe
blocks it stopped inlining (and I even rebuilt to check because that's so strange), but I added more specific comments inside a bigger block.
…ffset` Probably reasonable anyway since it more obviously drops provenance.
This adds a few more statements to `next`, but optimizes better in the loops (saving 2 blocks in `forward_loop`, for example)
4ddbbbd
to
7add358
Compare
Inspired by this zulip conversation: https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990
Draft for now because it needs #136735 to get the codegen tests to pass.