-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary sketch #177
base: master
Are you sure you want to change the base?
Summary sketch #177
Conversation
7d46006
to
c543f55
Compare
c543f55
to
be1a375
Compare
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
What problem are you trying to solve in this PR? |
Having quantile summaries entirely. The previous ones were known to be broken (see #170, #146, #159). My fix at #170 was just to stop it from breaking, but they were still not correct due to data-loss during race conditions, and surely not performant as they were doing lots of lookup-inserts of big data structures, when this PR does only |
303d6a4
to
c15728e
Compare
This uses now ddskerl, an implementation of the DDSketch algorithm, instead of quantile_estimator, which is in turn an implementation of Cormode's biased algorithms.
The most important difference is that the biased algorithms aren't mathematically mergeable, hence keeping a per-scheduler summary would never work, and keeping a single global one would never scale (and not mentioning race conditions on inserts as the data-structure is not possible to treat atomically in an ets table). Instead, the DDSketch algorithm is fully mergeable, and the data structure implementation has an ETS backend fully tested for all sorts of race conditions.
As a possibility that is out of the scope of this PR, a custom exporter might provide the Sketch data structure to datadog, so that the quantile merges can be done on the server, and that way the metrics server can aggregate quantiles across many hosts in a meaningful way.