Summary sketch #177

NelsonVides · 2025-03-24T12:07:49Z

This uses now ddskerl, an implementation of the DDSketch algorithm, instead of quantile_estimator, which is in turn an implementation of Cormode's biased algorithms.

The most important difference is that the biased algorithms aren't mathematically mergeable, hence keeping a per-scheduler summary would never work, and keeping a single global one would never scale (and not mentioning race conditions on inserts as the data-structure is not possible to treat atomically in an ets table). Instead, the DDSketch algorithm is fully mergeable, and the data structure implementation has an ETS backend fully tested for all sorts of race conditions.

As a possibility that is out of the scope of this PR, a custom exporter might provide the Sketch data structure to datadog, so that the quantile merges can be done on the server, and that way the metrics server can aggregate quantiles across many hosts in a meaningful way.

codecov · 2025-03-24T12:52:20Z

Codecov Report

Attention: Patch coverage is 94.95798% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/metrics/prometheus_quantile_summary.erl	92.77%	6 Missing ⚠️

Files with missing lines	Coverage Δ
src/prometheus_sup.erl	`77.41% <ø> (ø)`
src/prometheus_time.erl	`97.36% <100.00%> (-0.07%)`	⬇️
.../eunit/format/prometheus_protobuf_format_tests.erl	`100.00% <100.00%> (ø)`
test/eunit/format/prometheus_text_format_tests.erl	`100.00% <100.00%> (ø)`
...eunit/metric/prometheus_quantile_summary_tests.erl	`99.09% <100.00%> (+0.55%)`	⬆️
src/metrics/prometheus_quantile_summary.erl	`94.53% <92.77%> (+2.22%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lhoguin · 2025-03-24T13:09:46Z

What problem are you trying to solve in this PR?

NelsonVides · 2025-03-24T13:13:09Z

What problem are you trying to solve in this PR?

Having quantile summaries entirely. The previous ones were known to be broken (see #170, #146, #159). My fix at #170 was just to stop it from breaking, but they were still not correct due to data-loss during race conditions, and surely not performant as they were doing lots of lookup-inserts of big data structures, when this PR does only update_counters.

Optimise convert_du time helper

bfcd9b3

NelsonVides self-assigned this Mar 24, 2025

NelsonVides force-pushed the summary_sketch branch from 7d46006 to c543f55 Compare March 24, 2025 12:44

NelsonVides added 2 commits March 24, 2025 13:50

Reimplement summaries using ddskerl

5964399

Fix formatting tests

be1a375

NelsonVides force-pushed the summary_sketch branch from c543f55 to be1a375 Compare March 24, 2025 12:50

Make quantile summary table read_concurrent

c15728e

NelsonVides marked this pull request as ready for review March 24, 2025 12:57

NelsonVides requested review from deadtrickster, paulo-ferraz-oliveira, mikpe, mkuratczyk, onno-vos-dev, DenysGonchar, MirahImage and lhoguin March 24, 2025 12:58

NelsonVides force-pushed the summary_sketch branch from 303d6a4 to c15728e Compare March 29, 2025 15:58

NelsonVides added 2 commits March 29, 2025 18:52

Upgrade ddskerl

85e3e7e

Introduce wide width to quantile summaries

c6e78cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary sketch #177

Summary sketch #177

NelsonVides commented Mar 24, 2025 •

edited

Loading

codecov bot commented Mar 24, 2025 •

edited

Loading

lhoguin commented Mar 24, 2025

NelsonVides commented Mar 24, 2025

Summary sketch #177

Are you sure you want to change the base?

Summary sketch #177

Conversation

NelsonVides commented Mar 24, 2025 • edited Loading

codecov bot commented Mar 24, 2025 • edited Loading

Codecov Report

lhoguin commented Mar 24, 2025

NelsonVides commented Mar 24, 2025

NelsonVides commented Mar 24, 2025 •

edited

Loading

codecov bot commented Mar 24, 2025 •

edited

Loading