Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement timestamp encoder #169

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

asmello
Copy link

@asmello asmello commented Oct 20, 2023

The OpenMetrics spec allows for an optional timestamp field to be included along with metrics. This is generally discouraged, as Prometheus effectively determines the real measurement time when it scrapes for metrics, but adding it does make sense in combination with the experimental backfilling feature.

This PR introduces an alternative text encoder that reuses most of the existing logic and simply injects an additional timestamp value to the appropriate MetricPoints.

@asmello asmello marked this pull request as draft October 20, 2023 10:01
@asmello
Copy link
Author

asmello commented Oct 20, 2023

Ok, I just realised this overlaps with #129, but I'm not sure adding timestamps to const metrics solves the backfill use case fully, unless we're supposed to create const types from the normal ones?

})
}

fn with_timestamp(mut self) -> Result<Self, SystemTimeError> {
self.timestamp = Some(UnixTimestamp::now()?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. For backfilling, you would need to be able to control the timestamp, no? I.e. set it to some time in the past. Here you just set it to the current time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look. Yes, my original intent for this interface was that you'd always capture the current values, as if producing a frozen exposition. I think it would make sense to accept the timestamp as an argument though.

@asmello
Copy link
Author

asmello commented Nov 13, 2023

To provide some additional context, my end goal is to be able to capture metrics while disconnected from a Prometheus instance. Producing the OpenMetrics format with timestamps is how Prometheus supports backfilling, so my initial idea was to just make this crate support encoding in that format so I could use it for this use-case.

I eventually realised that the OpenMetrics format doesn't quite work as I needed it to, though, at least not directly. In order to collect a time-series of metric points for backfilling, it's not sufficient to keep appending timestamped expositions to a file, because of a few constraints the spec imposes:

  • The file has to end with # EOF (and there can only be one such line)
  • The descriptors have to immediately precede each cluster of points from the same metric family
  • There can be no interleaving of metric families (or points in the same label group)

So in order to use OpenMetrics directly, one would need to first decode the metrics and then produce a combined exposition containing both the old and the new points, effectively regenerating the entire file with each update. That's a fairly convoluted and inefficient way to operate.

What I ended up going with instead is going with a slightly relaxed version of the OpenMetrics grammar that allows for violating the above constraints. This way I can keep naively appending to the same file, and then run a post-processor to convert the output to strict OpenMetrics.

Unfortunately, this approach is not possible with prometheus_client because it doesn't expose a gather() API like the older prometheus crate does. We could potentially use a different intermediate representation, however, where we keep appending full expositions (descriptors, EOF line and all) each update. That's even more inefficient, but technically works.

By the way, what is the reasoning behind the decision to make access to metrics in the registry internal? Seems like you intentionally limit it so only this crate can implement encoders. If something like a gather() method on the Registry existed perhaps this timestamp-aware encoder wouldn't be necessary on this side.

@mxinden
Copy link
Member

mxinden commented Nov 22, 2023

By the way, what is the reasoning behind the decision to make access to metrics in the registry internal? Seems like you intentionally limit it so only this crate can implement encoders. If something like a gather() method on the Registry existed perhaps this timestamp-aware encoder wouldn't be necessary on this side.

The reason I expose as little as possible is two fold:

  1. It keeps the crate simple, a small surface makes it easier to understand the crate for the average user.
  2. I am the only maintainer, maintaining the crate in my spare time. The more I expose, the more I have to maintain, the harder it is to make changes as each results in a breaking change if exposed.

Unfortunately (2) is the most important here.

@asmello
Copy link
Author

asmello commented Nov 22, 2023

The reason I expose as little as possible is two fold:

  1. It keeps the crate simple, a small surface makes it easier to understand the crate for the average user.
  2. I am the only maintainer, maintaining the crate in my spare time. The more I expose, the more I have to maintain, the harder it is to make changes as each results in a breaking change if exposed.

Unfortunately (2) is the most important here.

This seems perfectly fair, but it also seems to me that giving more power to the library users would probably reduce work for you as a maintainer, since there would be less need of changes at the library core to account for niche use cases. To my understanding, the OpenMetrics data model is pretty stable, so exposing that part doesn't seem like a bad deal.

Just food for thought. I'm no longer blocking on this PR. Happy to close if you don't think it's headed in a productive direction - but also happy to make any changes you think might get it to the merging bar.

@kmeinhar
Copy link

kmeinhar commented Apr 8, 2024

I absolutely understand that this is not a priority and just wanted to add our use case to the discussion.

We have an IoT edge device that can be offline for hours or even days. In such a case we write the metrics to an SQLite database so that we can backfill it later into our cloud.
So our use case is more about converting an SQLite table to an OpenMetrics file.
In this case we would need to add the timestamp from the Database to every metric.

With the rust-prometheus library this is possible with a workaround similar to the one from asmello:
tikv/rust-prometheus#423

Not sure if our approach is the ideal and we are still in phase of determining the best strategies for our device metrics.

@mxinden
Copy link
Member

mxinden commented Apr 8, 2024

We have an IoT edge device that can be offline for hours or even days. In such a case we write the metrics to an SQLite database so that we can backfill it later into our cloud.

Why don't you use the Prometheus Push Gateway?

https://prometheus.io/docs/practices/pushing/

@asmello
Copy link
Author

asmello commented Apr 8, 2024

Why don't you use the Prometheus Push Gateway?

https://prometheus.io/docs/practices/pushing/

The use case for that is different, it's for getting around network limitations rather than backfilling. In fact, since Prometheus commits head blocks every 3 hours, the push gateway would fail to push any data that is older than that (which is likely to be the case if you have a need to backfill edge devices). I think the push gateway would only be useful to get around some transient connectivity issues (possibly, I'm not sure sends are can be made reliable or if they're send-and-forget).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants