Description
A system language should perform basic for loops efficiently. A problem was shown in issue #45222, another performance pitfall is created when you need steps larger than 1 and such step size is a run-time variable. The equivalent of the (efficient) C loop:
for (size_t j = i * 2; j < m; j += i) {...}
This shows the problem in a simple way, using a sieve (this code isn't meant to show an efficient sieve implementation). sieve1 uses step_by(variable) while sieve2 uses a while loop that should be equivalent:
fn sieve1(m: usize) -> Vec<bool> {
let mut primes = vec![true; m];
primes[0] = false;
primes[1] = false;
for i in 2 .. m {
if primes[i] {
for j in (i * 2 .. m).step_by(i) {
primes[j] = false;
}
}
}
primes
}
fn sieve2(m: usize) -> Vec<bool> {
let mut primes = vec![true; m];
primes[0] = false;
primes[1] = false;
for i in 2 .. m {
if primes[i] {
let mut j = i * 2;
while j < m {
primes[j] = false;
j += i;
}
}
}
primes
}
fn main() {
const M: usize = 150_000_000;
println!("{}", sieve1(M).into_iter().filter(|&b| b).count()); // 2.93s
println!("{}", sieve2(M).into_iter().filter(|&b| b).count()); // 2.86s
}
Commenting out the two lines in the main() shows a performance difference. The difference is small, but if you nest more than one for loop both using step_by the problem compounds. I think LLVM is able to remove this overhead from step_by is the step size is a compile-time constant and you have only one un-nested step_by.