Open
Description
I thought it might be useful to report here as I encountered two examples where the Rust versions were much slower than the C++ counterparts compiled with Clang. I ported them for my learning.
ary3
ary3.rs is about 4 times slower than ary3.cpp.
Based on the generated code, it seems that the difference is caused by a missed loop vectorization in the Rust version (details).
matrix
matrix.rs is about 2 times slower than matrix.cpp
I think that the difference is caused by the following (details):
- the
mmult
function isn't inlined into the main function (so the constant size of the vectors aren't constant propagated tommult
). - the bounds checks are not elided.
- loop unrolling has not been applied.
In both cases, the differences are in the frontend (rustc
and clang
) output, as opposed to the LLVM backend.
This is with Rustc 1.86.0 (05f9846f8 2025-03-31)
compiling with -O
on my x86_64 pc. The Clang version I used was Clang 20.1.5
compiling with -O2
.