Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fan: Split interval for actual/target duty #510

Merged
merged 3 commits into from
Jan 7, 2025
Merged

Conversation

crawfxrd
Copy link
Member

@crawfxrd crawfxrd commented Dec 30, 2024

Updating the actual duty is a detail of how the hardware control is implemented, and requires a short interval for its implementation.

Updating the target duty is a platform policy (the fan table), and does not require updating as frequently as the hardware control.

Because of this, the use of HEATUP and COOLDOWN needs to be re-evaluated.

HEATUP can be removed entirely. We want the quick response to rising temps without it instantly going to max from temp spikes. This is handled by the new hardware control, which limits how quickly the fan duty can change (3.9%/s).

COOLDOWN is still useful for hysteresis, to provide extra cooling as temps come down, but does not need to run for 10/20s it is set to. It is changed to 5s for all boards.

This should help address behavior of fans constantly turning on/off at point 0 threshold.

Test

  • Check fan change is still responsive enough to handle changes in load

Updating the actual duty is a detail of how the hardware control is
implemented, and requires a short interval for its implementation.

Updating the target duty is a platform policy (the fan table), and does
not require updating as frequently as the hardware control.

Signed-off-by: Tim Crawford <[email protected]>
@crawfxrd crawfxrd marked this pull request as ready for review December 30, 2024 18:31
@crawfxrd crawfxrd requested review from a team December 30, 2024 18:36
@crawfxrd crawfxrd linked an issue Dec 30, 2024 that may be closed by this pull request
@crawfxrd
Copy link
Member Author

HEATUP/COOLDOWN values probably need to be reduced with this.

jackpot51
jackpot51 previously approved these changes Dec 30, 2024
@leviport
Copy link
Member

I think HEATUP and COOLDOWN do need to be reduced. On oryp12, I'm seeing the CPU start throttling before the fans reach full speed.

@crawfxrd
Copy link
Member Author

Because I had tied the update interval together, HEATUP became effectively useless. Since these values actually use seconds for the unit again, their use need to re-evaluated.

HEATUP can probably be removed entirely. We want the quick response to rising temps without it instantly going to max from temp spikes. This is handled by the new hardware control, which limits how quickly the fan duty can change (3.9%/s).

COOLDOWN is still useful for hysteresis, to provide extra cooling as temps come down, but should use a smaller number than 10s/20s it is now doing.

Since these values actually use seconds for the unit again, their use
needs to be re-evaluated.

`HEATUP` can be removed entirely. We want the quick response to rising
temps without it instantly going to max from temp spikes. This is
handled by the new hardware control, which limits how quickly the fan
duty can change (3.9%/s).

`COOLDOWN` is still useful for hysteresis, to provide extra cooling as
temps come down, but does not need to run for 10/20s it is set to.

Signed-off-by: Tim Crawford <[email protected]>
Mitigate some fan noise by using a moving average instead of
instantaneous points for thermal logic.

Current logic is set to update every 250ms, so use 4 points to average
over 1s.

Signed-off-by: Tim Crawford <[email protected]>
Copy link
Member

@leviport leviport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior on oryp12 seems good. Comparing to currently released oryp12 firmware, the performance is about the same under full load. I think there might be a smidge more bouncing between on and off around 50°C, but it's pretty close to the same.

@crawfxrd
Copy link
Member Author

I think there might be a smidge more bouncing between on and off around 50°C

It should be about the same to released firmware:

  • 20 intervals @ 250ms = 5s total
  • 5 intervals @ 1000ms = 5s total

This is why I choose 5 for the new COOLDOWN value. I think the non-dGPU units (or really anything that was using COOLDOWN=10) will behave better at the on/off threshold.

Might be worth trying a slightly higher value to see if it mitigates this better. If so I could keep the board overrides for COOLDOWN.

(I think the better solution is a temperature-based hysteresis and not this time-based one, but I can't seem to get that to work correctly.)

@leviport
Copy link
Member

leviport commented Dec 31, 2024

It was very comparable. I just had two oryp12's sitting next to each other with s-tui running (shows fan RPM graphs) at idle. Both were hovering around 49-50°C, and the one with this EC had a slightly bumpier graph, with startup spikes:
IMG_20241230_165556_548
(the units were further apart while idling, I just pushed them together for the picture)

When I previously commented above, the fans were running way more, but I re-applied thermal paste and it was nearly the same as the other model running he currently released. There was a bit of air on the CPU die, so cooler contact was not ideal. I think that was partially to blame for the throttling I saw.

@crawfxrd
Copy link
Member Author

What's the intervals on those fan runs?

I could add a timeout so that when fans turn off, they stay off for a certain amount of time.

@leviport
Copy link
Member

leviport commented Dec 31, 2024

The light/dark bars are 2s intervals. The fan graphs are the two on the bottom.

@crawfxrd
Copy link
Member Author

I'll leave the on/off issue for later (again). The important thing here is separating the hardware control from setting the target duty so that manual fan control can be implemented.

@jackpot51 jackpot51 merged commit 5b6a0b9 into master Jan 7, 2025
48 checks passed
@jackpot51 jackpot51 deleted the fan-split-event branch January 7, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a moving average for temperatures
3 participants