Performance: parallelising kimchi prover more #2969

volhovm · 2025-01-27T13:18:29Z

A bunch of performance improvements to the kimchi prover aiming to utilise more parallelism where available.

As of now, the save-up seems to be as follows:

TODOs:

Check how much faster it is on mina side. Adjust the benchmark if necessary (more columns? lookups?)
- There is a 7% improvement as opposed to the 22% on just prover side. Maybe the benchmark circuit is too simplistic (not enough lookups?). Maybe it's because mina runs some other threads in parallel (like two provers in parallel?).
Make sure there are no regressions for the verifier side.
- Parallelising constraints seems to add 3% to the verifier for 1024 gates (and nothing when there are 16k gates?)?? generally this is hard to verify: even running every verifier benchmark for 5 minutes I still get sometimes 0.5% noise. Really hard to tell.
- Conclusion:
  - Measuring verification accurately seems to be hard for some reason I do not understand. I checked every single test parameter -- each verification bench runs for about 8 minutes (that's 5-6k iterations in total), and it's averaged over multiple SRS setups, over multiple proofs, and the benchmark type is "linear" which according to the criterion docs is the best way to do it. While in theory this seems more than enough to give reproducible results, I still see average noise around 0.5% (and sometimes up to 2%!!! out of nowhere) on both my laptop and on my server. What's even weirder is that in these cases the distribution has low variance still. I suspect this might be of some system-level threading: maybe due to the highly parallelised verification even 1-2 threads used by some system process causes this noise? I suspect this should be better on a "pristine clean" machine, or if we didn't rely heavily on parallelisation.
  - I think that there was a tiny regression (1-2%: 89ms instead of 86ms) for small circuits because of constraint parallelisation. Not important in any case, but I removed parallelisation for smaller (<4k) circuits. I do not observe any reproducible difference right now.
Check if parallel evaluations can be faster -- do we need to clone the vector and then interpolate?
- Does not matter much. Interpolating directly is only 1% faster, not worth the change.
Check that perm_aggreg optimisation preserves correctness
Last step: run the threading profiler once again to make sure I didn't miss anything

codecov · 2025-01-29T15:58:50Z

Codecov Report

Attention: Patch coverage is 97.11934% with 7 lines in your changes missing coverage. Please review.

Project coverage is 76.87%. Comparing base (22a9862) to head (5472fed).
Report is 77 commits behind head on master.

Files with missing lines	Patch %	Lines
kimchi/src/prover.rs	84.78%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2969      +/-   ##
==========================================
- Coverage   76.95%   76.87%   -0.08%     
==========================================
  Files         261      261              
  Lines       61457    62092     +635     
==========================================
+ Hits        47293    47736     +443     
- Misses      14164    14356     +192

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…er regression?)

50ms save-up?

volhovm force-pushed the volhovm/profiling-kimchi-thread-utilisation branch 26 times, most recently from fe029f4 to f8ddd85 Compare January 29, 2025 14:53

volhovm added 3 commits February 3, 2025 18:20

Criterion bench: Use verifier index in bench, improve params

95d0b47

Add IPA commitment benchmark function

33e1583

Add benches for MSM

38d9634

volhovm added 3 commits February 3, 2025 18:20

Add benches for vertically parallelised MSM

067ff11

Add more MSM parallelisation benches (horizontal + h&v)

6c1c56c

Add another weird bench with horizontal+vertical par for MSM

227fd91

volhovm force-pushed the volhovm/profiling-kimchi-thread-utilisation branch 3 times, most recently from 5472fed to 25d64a0 Compare February 5, 2025 15:42

volhovm added 10 commits February 7, 2025 17:29

Proof verification benches: improve precision

3c3c60b

Make SRS Sync and Send

25ac008

Remove unnecessary cloning in evaluation

a034bb4

Parallelise IPA commit function

6b60be3

Parallelise constraints creation

7e55623

Poly commitment: local MSM: process chunks in parallel

8dc5766

Parallelise permutation shifts

4558c8e

Kimchi prover: parallelise witness creation (+5% prover but 1% verifi…

0bb0589

…er regression?)

Parallelise perm_aggreg sub arrays

fc7b5ab

50ms save-up?

Turn off constraint parallelisation for small vector size?

a979221

volhovm force-pushed the volhovm/profiling-kimchi-thread-utilisation branch from d12e88e to a979221 Compare February 7, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: parallelising kimchi prover more #2969

Performance: parallelising kimchi prover more #2969

volhovm commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 29, 2025 •

edited

Loading

Performance: parallelising kimchi prover more #2969

Are you sure you want to change the base?

Performance: parallelising kimchi prover more #2969

Conversation

volhovm commented Jan 27, 2025 • edited Loading

codecov bot commented Jan 29, 2025 • edited Loading

Codecov Report

volhovm commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 29, 2025 •

edited

Loading