Use of double literals severely harming performance #40

Jaimies · 2024-09-16T19:47:13Z

The classes MultiStepper and AccelStepper frequently use double literals, such as 20.0 where float literals should be used (20.0f).
One particular example is AccelStepper's setSpeed() method - line 316 features the following:

_stepInterval = fabs(1000000.0 / speed);

As one can see, a double literal divided by the value of speed (a float).
When running the following code on an Arduino Uno R4 Wifi with the library unmodified, the execution of setSpeed() requires 833 CPU cycles as evidenced by cycles for setspeed(): 833 being printed on the serial. You may ignore the specifics, the potentially confusing-looking code simply measures the number of cycles before the start of and after the end of the execution of the setSpeed() invocation.

#include "AccelStepper.h"
#include "Serial.h"

#include <cstdint>

#define DEMCR_TRCENA    0x01000000
#define DWT_CTRL        (*((volatile uint32_t *) 0xE0001000))
#define DWT_CYCCNT      (*((volatile uint32_t *) 0xE0001004))
#define DEMCR           (*((volatile uint32_t *) 0xE000EDFC))
#define CYCCNTENA       (1 << 0)

void stopwatch_reset() {
    DEMCR |= DEMCR_TRCENA;
    DWT_CYCCNT = 0;
    DWT_CTRL |= CYCCNTENA;
}

AccelStepper stepper(AccelStepper::DRIVER, 2, 5);

void setup() {
    stopwatch_reset();
    Serial.begin(500000);
    stepper.setMaxSpeed(500000);
    uint32_t start = DWT_CYCCNT;
    stepper.setSpeed(42000.0f);
    uint32_t end = DWT_CYCCNT;
    Serial.print("cycles for setspeed: ");
    Serial.println(end - start);
}

void loop() {}

However, upon changing line 316 in the source code to be the following (using a float literal), the very same operation only requires 69 CPU cycles.

_stepInterval = fabs(1000000.0f / speed);

The following code (using an integer literal) does the job just as well with identical performance:

_stepInterval = fabs(1000000 / speed);

I would attribute this performance difference to the fact that the ARM Cortex M4 processor powering the Arduino Uno R4 has a Floating Precision Unit which only supports single precision, not double precision - with double precision operations requiring an immense amount of computational power. I have not tested if a similar performance difference is observed on Arduino Uno R3, the situation may be a bit different since it does not feature a Floating Precision Unit, but I would imagine that operations with floats would still be faster than operations with doubles.

If you need any more information, let me know.
I might work on a pull request addressing this issue at some point in the future.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of double literals severely harming performance #40

Use of double literals severely harming performance #40

Jaimies commented Sep 16, 2024

Use of double literals severely harming performance #40

Use of double literals severely harming performance #40

Comments

Jaimies commented Sep 16, 2024