Open
Description
Code
I tried this code:
use std::convert::TryInto;
pub fn mul3(previous: &[u8], current: &mut [u8]) {
let mut c_bpp = [0; 4];
for (chunk, b_bpp) in current.chunks_exact_mut(4).zip(previous.chunks_exact(4))
{
let new_chunk = [
chunk[0].wrapping_add(c_bpp[0]),
chunk[1].wrapping_add(c_bpp[1]),
chunk[2].wrapping_add(c_bpp[2]),
chunk[3].wrapping_add(c_bpp[3]),
];
*TryInto::<&mut [u8; 4]>::try_into(chunk).unwrap() = new_chunk;
c_bpp = b_bpp.try_into().unwrap();
}
}
I expected to see this happen: Function runs quickly thanks to auto-vectorization.
Instead, this happened: Function is 60% slower than before, because it now doesn't get vectorized
Godbolt comparison link: https://godbolt.org/z/8EhWdYc13
Version it worked on
It most recently worked on: rustc 1.86.0 (which uses LLVM version 19.1.7)
Version with regression
rustc --version --verbose
:
rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
Other context
This is an attempted minimization of image-rs/image-png#598.
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Category: An issue highlighting optimization opportunities or PRs implementing suchIssue: Problems and improvements with respect to performance of generated code.High priorityRelevant to the compiler team, which will review and decide on the PR/issue.Performance or correctness regression from one stable version to another.