Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Metrics: opendtu_last_update wraps to 0 at approx. 50 days, while opendtu_uptime does not #2225

Open
4 tasks done
easimon opened this issue Aug 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@easimon
Copy link
Contributor

easimon commented Aug 24, 2024

What happened?

Both opendtu_uptime and opendtu_last_update are counters in seconds, relative to the last reboot of the device.
But they seem to use different data types, since the opendtu_last_update wraps to zero at around 4.2 mio seconds , while the opendtu_uptime continues to increase beyond that.

When this happens, the difference opendtu_uptime - opendtu_last_update, which is the "amount of seconds since the last update", is not correct anymore.

Assumption: opendtu_last_update is millisecs internally, and so the wrap point is 4294967295 milliseconds (max uint32 / 1000).

To Reproduce Bug

  • reboot the device
  • wait 4294967295 milliseconds (around 50 days), never update or reboot
  • curl /api/prometheus/metrics
  • compare opendtu_last_update and opendtu_uptime. uptime is > 4294967 seconds, last update is close to 0

Expected Behavior

Both counters use the same numeric data type so they overflow at the same time.
Alternative: add a gauge metric that emits the "seconds since last update" directly, so I do not have to compute it.

Install Method

Pre-Compiled binary from GitHub

What git-hash/version of OpenDTU?

v24.5.6

Relevant log/trace output

No response

Anything else?

Side note: Over 50 days without a crash or reboot -- just this minor glitch. Solid software, good job 🚀

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported
@easimon easimon added the bug Something isn't working label Aug 24, 2024
@morremeyer
Copy link

I think the assumption that this is a 32-bit uint overflow update is correct.

The prometheus metrics are set in

serial.c_str(), i, name, inv->Statistics()->getLastUpdate() / 1000);

getLastUpdate() returns _lastUpdate:

uint32_t Parser::getLastUpdate() const
{
return _lastUpdate;
}

_lastUpdate is a uint32:

uint32_t _lastUpdate = 0;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants