Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DuplicatesBy::fold #921

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kinto-b
Copy link
Contributor

@kinto-b kinto-b commented Apr 18, 2024

Relates to #755. I didn't grab the output earlier (is there a way to print without re-benchmarking?) but this makes no difference to the performance. Submitting a PR for record keeping sake

{
  "mean": {
    "confidence_interval": {
      "confidence_level": 0.95,
      "lower_bound": -0.0081767782683812,
      "upper_bound": 0.0038785180687818605
    },
    "point_estimate": -0.00312222985963162,
    "standard_error": 0.003180882676320771
  },
  "median": {
    "confidence_interval": {
      "confidence_level": 0.95,
      "lower_bound": -0.009367663685134739,
      "upper_bound": -0.003824443197890548
    },
    "point_estimate": -0.006249808601540896,
    "standard_error": 0.0014046028043885176
  }
}

Copy link

codecov bot commented Apr 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.44%. Comparing base (6814180) to head (73d78c5).
Report is 54 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #921      +/-   ##
==========================================
+ Coverage   94.38%   94.44%   +0.06%     
==========================================
  Files          48       48              
  Lines        6665     6882     +217     
==========================================
+ Hits         6291     6500     +209     
- Misses        374      382       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Apr 18, 2024

Same for #918 and #919, it uses a HashMap and therefore heap allocates and (securely but slowly) hash things, those two things are the bottleneck. There is not much to expect of such specialization without improving that front:

With a faster hasher, maybe that specialize fold would a bit more beneficial for the benchmark to show some improvement. We might want to revisit those PRs once I make a more general DuplicatesByIn.

I referenced your PRs in #755 (comment) to avoid people from making duplicates PRs.

There would also be rfold to consider alongside fold.


Side note:
I'm aware specialization benchmarks are slow to build (2 minutes for me), and 8 seconds per benchmark to run (which is ok).
I usually comment out (or simply remove) benchmarks from the bench_specializations macro declaration (without committing changes) to have fast benchmarks. With that, I don't mind re-run benchmarks.
It's a pain but it's a huge file with a lot of iterators and methods to bench, there is no simple fix to avoid building benchmarks we are not gonna use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants