Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: rework decoder interface (#22)
The updated interface decodes codepoints directly from a reader rather than being implemented as a state machine. This turns out to be considerably more efficient than the previous implementation, with around 25% improvement on the `token_reader` and `reader` benchmarks: ``` Benchmark 1 (27 runs): zig-out/bin-old/token_reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 188ms ± 14.5ms 168ms … 205ms 0 ( 0%) 0% peak_rss 7.31MB ± 58.5KB 7.21MB … 7.34MB 0 ( 0%) 0% cpu_cycles 688M ± 4.20M 684M … 706M 1 ( 4%) 0% instructions 1.19G ± 29.4 1.19G … 1.19G 0 ( 0%) 0% cache_references 412K ± 763K 239K … 4.21M 2 ( 7%) 0% cache_misses 10.0K ± 7.40K 7.90K … 46.8K 2 ( 7%) 0% branch_misses 814K ± 1.37K 813K … 821K 1 ( 4%) 0% Benchmark 2 (37 runs): zig-out/bin/token_reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 136ms ± 13.8ms 115ms … 147ms 0 ( 0%) ⚡- 27.7% ± 3.8% peak_rss 7.31MB ± 54.7KB 7.21MB … 7.34MB 8 (22%) + 0.1% ± 0.4% cpu_cycles 462M ± 1.87M 459M … 466M 0 ( 0%) ⚡- 32.8% ± 0.2% instructions 1.14G ± 26.6 1.14G … 1.14G 0 ( 0%) ⚡- 4.1% ± 0.0% cache_references 236K ± 4.86K 227K … 244K 0 ( 0%) - 42.7% ± 60.7% cache_misses 9.40K ± 1.25K 7.88K … 11.5K 0 ( 0%) - 6.5% ± 24.6% branch_misses 815K ± 1.01K 813K … 817K 0 ( 0%) + 0.1% ± 0.1% ``` ``` Benchmark 1 (23 runs): zig-out/bin-old/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 225ms ± 14.2ms 199ms … 249ms 0 ( 0%) 0% peak_rss 7.25MB ± 100KB 7.08MB … 7.34MB 0 ( 0%) 0% cpu_cycles 823M ± 12.2M 813M … 847M 0 ( 0%) 0% instructions 1.43G ± 23.0 1.43G … 1.43G 0 ( 0%) 0% cache_references 757K ± 129K 635K … 1.07M 1 ( 4%) 0% cache_misses 13.7K ± 1.18K 12.5K … 17.2K 2 ( 9%) 0% branch_misses 1.43M ± 3.35K 1.42M … 1.43M 0 ( 0%) 0% Benchmark 2 (31 runs): zig-out/bin/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 166ms ± 13.9ms 144ms … 175ms 0 ( 0%) ⚡- 26.5% ± 3.4% peak_rss 7.27MB ± 81.8KB 7.08MB … 7.34MB 0 ( 0%) + 0.3% ± 0.7% cpu_cycles 581M ± 1.54M 579M … 584M 0 ( 0%) ⚡- 29.4% ± 0.5% instructions 1.38G ± 16.0 1.38G … 1.38G 9 (29%) ⚡- 3.8% ± 0.0% cache_references 715K ± 219K 563K … 1.71M 3 (10%) - 5.5% ± 13.6% cache_misses 13.5K ± 1.31K 11.4K … 16.5K 2 ( 6%) - 1.2% ± 5.1% branch_misses 1.07M ± 20.3K 1.05M … 1.11M 5 (16%) ⚡- 25.3% ± 0.6% ```
- Loading branch information