Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify slice::Iter::next enough that it inlines #136771

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Feb 9, 2025

Inspired by this zulip conversation: https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990

Draft for now because it needs #136735 to get the codegen tests to pass.

@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 9, 2025
@scottmcm
Copy link
Member Author

scottmcm commented Feb 9, 2025

Let's see whether it actually improves things:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 9, 2025
@bors
Copy link
Contributor

bors commented Feb 9, 2025

⌛ Trying commit f7970b3 with merge 30df00c...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
Simplify `slice::Iter::next` enough that it inlines

Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990>

Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 9, 2025

☀️ Try build successful - checks-actions
Build commit: 30df00c (30df00cd8218095deb80cc6b913de02f5ae4a5b0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (30df00c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.4%] 1
Regressions ❌
(secondary)
9.4% [9.4%, 9.4%] 1
Improvements ✅
(primary)
-0.6% [-2.2%, -0.1%] 210
Improvements ✅
(secondary)
-0.5% [-1.1%, -0.1%] 131
All ❌✅ (primary) -0.6% [-2.2%, 0.4%] 211

Max RSS (memory usage)

Results (primary -2.2%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [1.7%, 2.8%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.4% [-8.5%, -2.4%] 6
Improvements ✅
(secondary)
-2.5% [-2.5%, -2.5%] 1
All ❌✅ (primary) -2.2% [-8.5%, 2.8%] 9

Cycles

Results (primary -1.0%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [4.6%, 4.6%] 1
Improvements ✅
(primary)
-1.0% [-1.1%, -0.9%] 2
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) -1.0% [-1.1%, -0.9%] 2

Binary size

Results (primary 0.1%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 5.0%] 28
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.5%, -0.0%] 45
Improvements ✅
(secondary)
-0.1% [-1.3%, -0.0%] 48
All ❌✅ (primary) 0.1% [-0.5%, 5.0%] 73

Bootstrap: 780.488s -> 778.559s (-0.25%)
Artifact size: 329.04 MiB -> 329.26 MiB (0.06%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 9, 2025
@scottmcm scottmcm force-pushed the poke-slice-iter-next branch from f7970b3 to 85deb4d Compare February 11, 2025 09:51
@scottmcm scottmcm removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 11, 2025
@@ -14,11 +14,11 @@
// CHECK-LABEL: @slice_iter_next(
#[no_mangle]
pub fn slice_iter_next<'a>(it: &mut std::slice::Iter<'a, u32>) -> Option<&'a u32> {
// CHECK: %[[ENDP:.+]] = getelementptr inbounds{{( nuw)?}} i8, ptr %it, {{i32 4|i64 8}}
// CHECK: %[[END:.+]] = load ptr, ptr %[[ENDP]]
// CHECK: %[[START:.+]] = load ptr, ptr %it,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased atop the transmute-gives-asserts change; codegen tests should be passing now with just this trivial change that it's loading the start pointer first instead of the end pointer first.

@@ -4,28 +4,30 @@ fn enumerated_loop(_1: &[T], _2: impl Fn(usize, &T)) -> () {
debug slice => _1;
debug f => _2;
let mut _0: ();
let mut _11: std::slice::Iter<'_, T>;
let mut _12: std::iter::Enumerate<std::slice::Iter<'_, T>>;
let mut _13: std::iter::Enumerate<std::slice::Iter<'_, T>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that the Enumerate iterators get completely SRoAed even just in MIR, with this!

@scottmcm
Copy link
Member Author

Just to check that having the assumes in the LLVM-IR doesn't somehow lose all of the gains:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2025
@bors
Copy link
Contributor

bors commented Feb 11, 2025

⌛ Trying commit 85deb4d with merge eaf73cd...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 11, 2025
Simplify `slice::Iter::next` enough that it inlines

Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990>

Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
@bors
Copy link
Contributor

bors commented Feb 11, 2025

☀️ Try build successful - checks-actions
Build commit: eaf73cd (eaf73cd9c4fc608d616056b232ccb9cbb8df5679)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (eaf73cd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.5%] 2
Regressions ❌
(secondary)
9.3% [9.3%, 9.3%] 1
Improvements ✅
(primary)
-0.6% [-2.2%, -0.1%] 196
Improvements ✅
(secondary)
-0.5% [-2.4%, -0.1%] 101
All ❌✅ (primary) -0.6% [-2.2%, 0.5%] 198

Max RSS (memory usage)

Results (primary -1.8%, secondary -3.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [2.1%, 3.3%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.9% [-8.7%, -0.7%] 13
Improvements ✅
(secondary)
-3.2% [-7.1%, -2.0%] 29
All ❌✅ (primary) -1.8% [-8.7%, 3.3%] 16

Cycles

Results (primary -1.4%, secondary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.0% [2.1%, 5.4%] 5
Improvements ✅
(primary)
-1.4% [-1.8%, -1.0%] 3
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -1.4% [-1.8%, -1.0%] 3

Binary size

Results (primary 0.0%, secondary -0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 3.9%] 38
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-1.8%, -0.1%] 50
Improvements ✅
(secondary)
-0.2% [-2.0%, -0.0%] 83
All ❌✅ (primary) 0.0% [-1.8%, 3.9%] 88

Bootstrap: 785.339s -> 785.217s (-0.02%)
Artifact size: 348.32 MiB -> 348.38 MiB (0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2025
@@ -3,12 +3,134 @@
fn slice_iter_next(_1: &mut std::slice::Iter<'_, T>) -> Option<&T> {
debug it => _1;
let mut _0: std::option::Option<&T>;
scope 1 (inlined <std::slice::Iter<'_, T> as Iterator>::next) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to compare to nightly, https://godbolt.org/z/Mr6cW9PWc

@scottmcm scottmcm force-pushed the poke-slice-iter-next branch from 85deb4d to 4392415 Compare February 14, 2025 05:45
@scottmcm
Copy link
Member Author

Make a small tweak to the approach; let's see whether it was actually good:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 14, 2025
@bors
Copy link
Contributor

bors commented Feb 14, 2025

⌛ Trying commit 4ddbbbd with merge d786e76...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 14, 2025
Simplify `slice::Iter::next` enough that it inlines

Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990>

Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
@bors
Copy link
Contributor

bors commented Feb 14, 2025

☀️ Try build successful - checks-actions
Build commit: d786e76 (d786e76b71a6ff16343a9251ccb9db66784dddda)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d786e76): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 2
Regressions ❌
(secondary)
9.2% [9.2%, 9.2%] 1
Improvements ✅
(primary)
-0.6% [-2.2%, -0.2%] 204
Improvements ✅
(secondary)
-0.5% [-2.3%, -0.1%] 101
All ❌✅ (primary) -0.6% [-2.2%, 0.5%] 206

Max RSS (memory usage)

Results (primary -1.7%, secondary 0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.4% [2.9%, 9.8%] 2
Regressions ❌
(secondary)
3.8% [3.4%, 4.2%] 2
Improvements ✅
(primary)
-3.5% [-10.0%, -0.8%] 9
Improvements ✅
(secondary)
-5.0% [-5.0%, -5.0%] 1
All ❌✅ (primary) -1.7% [-10.0%, 9.8%] 11

Cycles

Results (primary -1.2%, secondary 1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.5% [4.5%, 4.5%] 1
Improvements ✅
(primary)
-1.2% [-2.0%, -0.8%] 11
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -1.2% [-2.0%, -0.8%] 11

Binary size

Results (primary -0.2%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.0%, 0.8%] 31
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 34
Improvements ✅
(primary)
-0.4% [-1.8%, -0.1%] 61
Improvements ✅
(secondary)
-0.3% [-2.0%, -0.0%] 11
All ❌✅ (primary) -0.2% [-1.8%, 0.8%] 92

Bootstrap: 789.679s -> 787.912s (-0.22%)
Artifact size: 347.73 MiB -> 348.31 MiB (0.17%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 14, 2025
@scottmcm scottmcm closed this Feb 14, 2025
@scottmcm scottmcm reopened this Feb 14, 2025
@scottmcm scottmcm marked this pull request as ready for review February 14, 2025 18:40
@scottmcm
Copy link
Member Author

scottmcm commented Feb 14, 2025

With #136735 having landed, this is good for a review now.

@rustbot ready

The one large regression is secondary, and looks to be one of the codegen units just having way more to do in LLVM:
image

// safe since we check if the iterator is empty first.
let ptr = self.ptr;
let end_or_len = self.end_or_len;
// SAFETY: Type invariants.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety comment is a bit too sloppy imo. At least it should say something like "same as above" if you want to avoid repetition. Or maybe split it into two unsafe blocks, one for each arm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true.

Weirdly when I added tighter-scoped unsafe blocks it stopped inlining (and I even rebuilt to check because that's so strange), but I added more specific comments inside a bigger block.

…ffset`

Probably reasonable anyway since it more obviously drops provenance.
This adds a few more statements to `next`, but optimizes better in the loops (saving 2 blocks in `forward_loop`, for example)
@scottmcm scottmcm force-pushed the poke-slice-iter-next branch from 4ddbbbd to 7add358 Compare February 15, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants