-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: move data out of Scanner.Token
#26
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
By storing the token data in a separate `Scanner` field and having `Token` be merely the token type, we can avoid a decent amount of copying when tokens are passed around. This leads to considerable speedups for the `TokenReader` and `Reader` benchmarks (the `Scanner` benchmark is slightly slower, but that probably has more to do with how that particular benchmark is written, since the token data was previously discarded). ``` Benchmark 1 (120 runs): zig-out/bin-old/scanner Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 41.6ms ± 470us 40.7ms … 43.5ms 1 ( 1%) 0% peak_rss 7.27MB ± 88.0KB 7.08MB … 7.34MB 0 ( 0%) 0% cpu_cycles 152M ± 839K 151M … 158M 3 ( 3%) 0% instructions 472M ± 20.8 472M … 472M 0 ( 0%) 0% cache_references 270K ± 625K 206K … 7.03M 10 ( 8%) 0% cache_misses 7.95K ± 260 7.61K … 9.81K 3 ( 3%) 0% branch_misses 511K ± 631 510K … 512K 18 (15%) 0% Benchmark 2 (116 runs): zig-out/bin/scanner Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 43.0ms ± 452us 41.9ms … 44.4ms 4 ( 3%) 💩+ 3.3% ± 0.3% peak_rss 7.28MB ± 77.9KB 7.08MB … 7.34MB 0 ( 0%) + 0.2% ± 0.3% cpu_cycles 158M ± 694K 156M … 159M 0 ( 0%) 💩+ 4.0% ± 0.1% instructions 527M ± 19.4 527M … 527M 27 (23%) 💩+ 11.7% ± 0.0% cache_references 234K ± 265K 207K … 3.06M 10 ( 9%) - 13.5% ± 45.6% cache_misses 7.93K ± 435 7.49K … 11.8K 5 ( 4%) - 0.3% ± 1.1% branch_misses 514K ± 335 513K … 515K 1 ( 1%) + 0.7% ± 0.0% ``` ``` Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 116ms ± 631us 115ms … 117ms 0 ( 0%) 0% peak_rss 7.30MB ± 59.0KB 7.21MB … 7.34MB 0 ( 0%) 0% cpu_cycles 462M ± 1.91M 459M … 466M 0 ( 0%) 0% instructions 1.14G ± 21.9 1.14G … 1.14G 0 ( 0%) 0% cache_references 233K ± 6.77K 226K … 253K 3 ( 7%) 0% cache_misses 9.69K ± 1.48K 8.05K … 13.6K 0 ( 0%) 0% branch_misses 815K ± 1.16K 813K … 817K 0 ( 0%) 0% Benchmark 2 (72 runs): zig-out/bin/token_reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 70.2ms ± 782us 68.9ms … 75.3ms 2 ( 3%) ⚡- 39.4% ± 0.2% peak_rss 7.29MB ± 63.4KB 7.21MB … 7.34MB 0 ( 0%) - 0.2% ± 0.3% cpu_cycles 271M ± 2.75M 268M … 291M 7 (10%) ⚡- 41.3% ± 0.2% instructions 885M ± 19.2 885M … 885M 17 (24%) ⚡- 22.6% ± 0.0% cache_references 224K ± 7.03K 219K … 263K 7 (10%) ⚡- 3.9% ± 1.1% cache_misses 8.32K ± 909 7.80K … 14.4K 6 ( 8%) ⚡- 14.1% ± 4.5% branch_misses 671K ± 42.3K 664K … 1.03M 3 ( 4%) ⚡- 17.6% ± 1.6% ``` ``` Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 145ms ± 857us 143ms … 148ms 2 ( 6%) 0% peak_rss 7.29MB ± 65.1KB 7.21MB … 7.34MB 0 ( 0%) 0% cpu_cycles 582M ± 3.06M 578M … 596M 1 ( 3%) 0% instructions 1.38G ± 24.7 1.38G … 1.38G 0 ( 0%) 0% cache_references 758K ± 196K 513K … 1.59M 2 ( 6%) 0% cache_misses 14.3K ± 6.84K 11.4K … 49.2K 4 (11%) 0% branch_misses 1.06M ± 14.0K 1.05M … 1.11M 3 ( 9%) 0% Benchmark 2 (48 runs): zig-out/bin/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 105ms ± 1.55ms 104ms … 113ms 1 ( 2%) ⚡- 27.2% ± 0.4% peak_rss 7.27MB ± 93.6KB 7.08MB … 7.34MB 0 ( 0%) - 0.2% ± 0.5% cpu_cycles 419M ± 6.29M 414M … 450M 1 ( 2%) ⚡- 28.1% ± 0.4% instructions 1.13G ± 19.6 1.13G … 1.13G 11 (23%) ⚡- 18.1% ± 0.0% cache_references 575K ± 59.7K 490K … 797K 1 ( 2%) ⚡- 24.2% ± 7.9% cache_misses 12.5K ± 876 11.4K … 15.3K 5 (10%) - 12.3% ± 13.9% branch_misses 1.07M ± 4.22K 1.07M … 1.09M 8 (17%) + 1.2% ± 0.4% ```
ianprime0509
added a commit
that referenced
this pull request
Oct 16, 2023
These optimizations are analogous to those made in #26 for `Scanner`. The performance improvement for `Reader` is less significant, but still noticeable: ``` Benchmark 1 (48 runs): zig-out/bin-old/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 106ms ± 1.72ms 104ms … 110ms 0 ( 0%) 0% peak_rss 7.28MB ± 80.9KB 7.08MB … 7.34MB 0 ( 0%) 0% cpu_cycles 420M ± 6.94M 413M … 438M 0 ( 0%) 0% instructions 1.13G ± 19.3 1.13G … 1.13G 0 ( 0%) 0% cache_references 561K ± 72.6K 477K … 878K 2 ( 4%) 0% cache_misses 12.7K ± 1.31K 11.3K … 18.3K 3 ( 6%) 0% branch_misses 1.07M ± 5.21K 1.07M … 1.09M 0 ( 0%) 0% Benchmark 2 (50 runs): zig-out/bin/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 101ms ± 1.08ms 98.9ms … 103ms 1 ( 2%) ⚡- 4.9% ± 0.5% peak_rss 7.29MB ± 63.6KB 7.21MB … 7.34MB 0 ( 0%) + 0.2% ± 0.4% cpu_cycles 398M ± 4.24M 393M … 410M 1 ( 2%) ⚡- 5.2% ± 0.5% instructions 1.11G ± 20.8 1.11G … 1.11G 0 ( 0%) ⚡- 1.5% ± 0.0% cache_references 404K ± 244K 314K … 1.82M 4 ( 8%) ⚡- 28.0% ± 13.0% cache_misses 12.0K ± 1.11K 10.1K … 15.3K 0 ( 0%) ⚡- 5.6% ± 3.8% branch_misses 1.10M ± 2.89K 1.10M … 1.12M 4 ( 8%) 💩+ 2.8% ± 0.2% ```
ianprime0509
added a commit
that referenced
this pull request
Oct 16, 2023
These optimizations are analogous to those made in #26 for `Scanner`. The performance improvement for `Reader` is less significant, but still noticeable: ``` Benchmark 1 (48 runs): zig-out/bin-old/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 106ms ± 1.72ms 104ms … 110ms 0 ( 0%) 0% peak_rss 7.28MB ± 80.9KB 7.08MB … 7.34MB 0 ( 0%) 0% cpu_cycles 420M ± 6.94M 413M … 438M 0 ( 0%) 0% instructions 1.13G ± 19.3 1.13G … 1.13G 0 ( 0%) 0% cache_references 561K ± 72.6K 477K … 878K 2 ( 4%) 0% cache_misses 12.7K ± 1.31K 11.3K … 18.3K 3 ( 6%) 0% branch_misses 1.07M ± 5.21K 1.07M … 1.09M 0 ( 0%) 0% Benchmark 2 (50 runs): zig-out/bin/reader Gtk-4.0.gir measurement mean ± σ min … max outliers delta wall_time 101ms ± 1.08ms 98.9ms … 103ms 1 ( 2%) ⚡- 4.9% ± 0.5% peak_rss 7.29MB ± 63.6KB 7.21MB … 7.34MB 0 ( 0%) + 0.2% ± 0.4% cpu_cycles 398M ± 4.24M 393M … 410M 1 ( 2%) ⚡- 5.2% ± 0.5% instructions 1.11G ± 20.8 1.11G … 1.11G 0 ( 0%) ⚡- 1.5% ± 0.0% cache_references 404K ± 244K 314K … 1.82M 4 ( 8%) ⚡- 28.0% ± 13.0% cache_misses 12.0K ± 1.11K 10.1K … 15.3K 0 ( 0%) ⚡- 5.6% ± 3.8% branch_misses 1.10M ± 2.89K 1.10M … 1.12M 4 ( 8%) 💩+ 2.8% ± 0.2% ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By storing the token data in a separate
Scanner
field and havingToken
be merely the token type, we can avoid a decent amount of copying when tokens are passed around. This leads to considerable speedups for theTokenReader
andReader
benchmarks (theScanner
benchmark is slightly slower, but that probably has more to do with how that particular benchmark is written, since the token data was previously discarded).