Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary sketch #177

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Summary sketch #177

wants to merge 6 commits into from

Conversation

NelsonVides
Copy link
Member

@NelsonVides NelsonVides commented Mar 24, 2025

This uses now ddskerl, an implementation of the DDSketch algorithm, instead of quantile_estimator, which is in turn an implementation of Cormode's biased algorithms.

The most important difference is that the biased algorithms aren't mathematically mergeable, hence keeping a per-scheduler summary would never work, and keeping a single global one would never scale (and not mentioning race conditions on inserts as the data-structure is not possible to treat atomically in an ets table). Instead, the DDSketch algorithm is fully mergeable, and the data structure implementation has an ETS backend fully tested for all sorts of race conditions.

As a possibility that is out of the scope of this PR, a custom exporter might provide the Sketch data structure to datadog, so that the quantile merges can be done on the server, and that way the metrics server can aggregate quantiles across many hosts in a meaningful way.

@NelsonVides NelsonVides self-assigned this Mar 24, 2025
Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 94.95798% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/metrics/prometheus_quantile_summary.erl 92.77% 6 Missing ⚠️
Files with missing lines Coverage Δ
src/prometheus_sup.erl 77.41% <ø> (ø)
src/prometheus_time.erl 97.36% <100.00%> (-0.07%) ⬇️
.../eunit/format/prometheus_protobuf_format_tests.erl 100.00% <100.00%> (ø)
test/eunit/format/prometheus_text_format_tests.erl 100.00% <100.00%> (ø)
...eunit/metric/prometheus_quantile_summary_tests.erl 99.09% <100.00%> (+0.55%) ⬆️
src/metrics/prometheus_quantile_summary.erl 94.53% <92.77%> (+2.22%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lhoguin
Copy link

lhoguin commented Mar 24, 2025

What problem are you trying to solve in this PR?

@NelsonVides
Copy link
Member Author

What problem are you trying to solve in this PR?

Having quantile summaries entirely. The previous ones were known to be broken (see #170, #146, #159). My fix at #170 was just to stop it from breaking, but they were still not correct due to data-loss during race conditions, and surely not performant as they were doing lots of lookup-inserts of big data structures, when this PR does only update_counters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants