Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: move data out of Scanner.Token #26

Merged
merged 1 commit into from
Oct 15, 2023
Merged

perf: move data out of Scanner.Token #26

merged 1 commit into from
Oct 15, 2023

Conversation

ianprime0509
Copy link
Owner

By storing the token data in a separate Scanner field and having Token be merely the token type, we can avoid a decent amount of copying when tokens are passed around. This leads to considerable speedups for the TokenReader and Reader benchmarks (the Scanner benchmark is slightly slower, but that probably has more to do with how that particular benchmark is written, since the token data was previously discarded).

Benchmark 1 (120 runs): zig-out/bin-old/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          41.6ms ±  470us    40.7ms … 43.5ms          1 ( 1%)        0%
  peak_rss           7.27MB ± 88.0KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          152M  ±  839K      151M  …  158M           3 ( 3%)        0%
  instructions        472M  ± 20.8       472M  …  472M           0 ( 0%)        0%
  cache_references    270K  ±  625K      206K  … 7.03M          10 ( 8%)        0%
  cache_misses       7.95K  ±  260      7.61K  … 9.81K           3 ( 3%)        0%
  branch_misses       511K  ±  631       510K  …  512K          18 (15%)        0%
Benchmark 2 (116 runs): zig-out/bin/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          43.0ms ±  452us    41.9ms … 44.4ms          4 ( 3%)        💩+  3.3% ±  0.3%
  peak_rss           7.28MB ± 77.9KB    7.08MB … 7.34MB          0 ( 0%)          +  0.2% ±  0.3%
  cpu_cycles          158M  ±  694K      156M  …  159M           0 ( 0%)        💩+  4.0% ±  0.1%
  instructions        527M  ± 19.4       527M  …  527M          27 (23%)        💩+ 11.7% ±  0.0%
  cache_references    234K  ±  265K      207K  … 3.06M          10 ( 9%)          - 13.5% ± 45.6%
  cache_misses       7.93K  ±  435      7.49K  … 11.8K           5 ( 4%)          -  0.3% ±  1.1%
  branch_misses       514K  ±  335       513K  …  515K           1 ( 1%)          +  0.7% ±  0.0%
Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           116ms ±  631us     115ms …  117ms          0 ( 0%)        0%
  peak_rss           7.30MB ± 59.0KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          462M  ± 1.91M      459M  …  466M           0 ( 0%)        0%
  instructions       1.14G  ± 21.9      1.14G  … 1.14G           0 ( 0%)        0%
  cache_references    233K  ± 6.77K      226K  …  253K           3 ( 7%)        0%
  cache_misses       9.69K  ± 1.48K     8.05K  … 13.6K           0 ( 0%)        0%
  branch_misses       815K  ± 1.16K      813K  …  817K           0 ( 0%)        0%
Benchmark 2 (72 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          70.2ms ±  782us    68.9ms … 75.3ms          2 ( 3%)        ⚡- 39.4% ±  0.2%
  peak_rss           7.29MB ± 63.4KB    7.21MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.3%
  cpu_cycles          271M  ± 2.75M      268M  …  291M           7 (10%)        ⚡- 41.3% ±  0.2%
  instructions        885M  ± 19.2       885M  …  885M          17 (24%)        ⚡- 22.6% ±  0.0%
  cache_references    224K  ± 7.03K      219K  …  263K           7 (10%)        ⚡-  3.9% ±  1.1%
  cache_misses       8.32K  ±  909      7.80K  … 14.4K           6 ( 8%)        ⚡- 14.1% ±  4.5%
  branch_misses       671K  ± 42.3K      664K  … 1.03M           3 ( 4%)        ⚡- 17.6% ±  1.6%
Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           145ms ±  857us     143ms …  148ms          2 ( 6%)        0%
  peak_rss           7.29MB ± 65.1KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          582M  ± 3.06M      578M  …  596M           1 ( 3%)        0%
  instructions       1.38G  ± 24.7      1.38G  … 1.38G           0 ( 0%)        0%
  cache_references    758K  ±  196K      513K  … 1.59M           2 ( 6%)        0%
  cache_misses       14.3K  ± 6.84K     11.4K  … 49.2K           4 (11%)        0%
  branch_misses      1.06M  ± 14.0K     1.05M  … 1.11M           3 ( 9%)        0%
Benchmark 2 (48 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105ms ± 1.55ms     104ms …  113ms          1 ( 2%)        ⚡- 27.2% ±  0.4%
  peak_rss           7.27MB ± 93.6KB    7.08MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.5%
  cpu_cycles          419M  ± 6.29M      414M  …  450M           1 ( 2%)        ⚡- 28.1% ±  0.4%
  instructions       1.13G  ± 19.6      1.13G  … 1.13G          11 (23%)        ⚡- 18.1% ±  0.0%
  cache_references    575K  ± 59.7K      490K  …  797K           1 ( 2%)        ⚡- 24.2% ±  7.9%
  cache_misses       12.5K  ±  876      11.4K  … 15.3K           5 (10%)          - 12.3% ± 13.9%
  branch_misses      1.07M  ± 4.22K     1.07M  … 1.09M           8 (17%)          +  1.2% ±  0.4%

By storing the token data in a separate `Scanner` field and having
`Token` be merely the token type, we can avoid a decent amount of
copying when tokens are passed around. This leads to considerable
speedups for the `TokenReader` and `Reader` benchmarks (the `Scanner`
benchmark is slightly slower, but that probably has more to do with how
that particular benchmark is written, since the token data was
previously discarded).

```
Benchmark 1 (120 runs): zig-out/bin-old/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          41.6ms ±  470us    40.7ms … 43.5ms          1 ( 1%)        0%
  peak_rss           7.27MB ± 88.0KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          152M  ±  839K      151M  …  158M           3 ( 3%)        0%
  instructions        472M  ± 20.8       472M  …  472M           0 ( 0%)        0%
  cache_references    270K  ±  625K      206K  … 7.03M          10 ( 8%)        0%
  cache_misses       7.95K  ±  260      7.61K  … 9.81K           3 ( 3%)        0%
  branch_misses       511K  ±  631       510K  …  512K          18 (15%)        0%
Benchmark 2 (116 runs): zig-out/bin/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          43.0ms ±  452us    41.9ms … 44.4ms          4 ( 3%)        💩+  3.3% ±  0.3%
  peak_rss           7.28MB ± 77.9KB    7.08MB … 7.34MB          0 ( 0%)          +  0.2% ±  0.3%
  cpu_cycles          158M  ±  694K      156M  …  159M           0 ( 0%)        💩+  4.0% ±  0.1%
  instructions        527M  ± 19.4       527M  …  527M          27 (23%)        💩+ 11.7% ±  0.0%
  cache_references    234K  ±  265K      207K  … 3.06M          10 ( 9%)          - 13.5% ± 45.6%
  cache_misses       7.93K  ±  435      7.49K  … 11.8K           5 ( 4%)          -  0.3% ±  1.1%
  branch_misses       514K  ±  335       513K  …  515K           1 ( 1%)          +  0.7% ±  0.0%
```

```
Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           116ms ±  631us     115ms …  117ms          0 ( 0%)        0%
  peak_rss           7.30MB ± 59.0KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          462M  ± 1.91M      459M  …  466M           0 ( 0%)        0%
  instructions       1.14G  ± 21.9      1.14G  … 1.14G           0 ( 0%)        0%
  cache_references    233K  ± 6.77K      226K  …  253K           3 ( 7%)        0%
  cache_misses       9.69K  ± 1.48K     8.05K  … 13.6K           0 ( 0%)        0%
  branch_misses       815K  ± 1.16K      813K  …  817K           0 ( 0%)        0%
Benchmark 2 (72 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          70.2ms ±  782us    68.9ms … 75.3ms          2 ( 3%)        ⚡- 39.4% ±  0.2%
  peak_rss           7.29MB ± 63.4KB    7.21MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.3%
  cpu_cycles          271M  ± 2.75M      268M  …  291M           7 (10%)        ⚡- 41.3% ±  0.2%
  instructions        885M  ± 19.2       885M  …  885M          17 (24%)        ⚡- 22.6% ±  0.0%
  cache_references    224K  ± 7.03K      219K  …  263K           7 (10%)        ⚡-  3.9% ±  1.1%
  cache_misses       8.32K  ±  909      7.80K  … 14.4K           6 ( 8%)        ⚡- 14.1% ±  4.5%
  branch_misses       671K  ± 42.3K      664K  … 1.03M           3 ( 4%)        ⚡- 17.6% ±  1.6%
```

```
Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           145ms ±  857us     143ms …  148ms          2 ( 6%)        0%
  peak_rss           7.29MB ± 65.1KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          582M  ± 3.06M      578M  …  596M           1 ( 3%)        0%
  instructions       1.38G  ± 24.7      1.38G  … 1.38G           0 ( 0%)        0%
  cache_references    758K  ±  196K      513K  … 1.59M           2 ( 6%)        0%
  cache_misses       14.3K  ± 6.84K     11.4K  … 49.2K           4 (11%)        0%
  branch_misses      1.06M  ± 14.0K     1.05M  … 1.11M           3 ( 9%)        0%
Benchmark 2 (48 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105ms ± 1.55ms     104ms …  113ms          1 ( 2%)        ⚡- 27.2% ±  0.4%
  peak_rss           7.27MB ± 93.6KB    7.08MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.5%
  cpu_cycles          419M  ± 6.29M      414M  …  450M           1 ( 2%)        ⚡- 28.1% ±  0.4%
  instructions       1.13G  ± 19.6      1.13G  … 1.13G          11 (23%)        ⚡- 18.1% ±  0.0%
  cache_references    575K  ± 59.7K      490K  …  797K           1 ( 2%)        ⚡- 24.2% ±  7.9%
  cache_misses       12.5K  ±  876      11.4K  … 15.3K           5 (10%)          - 12.3% ± 13.9%
  branch_misses      1.07M  ± 4.22K     1.07M  … 1.09M           8 (17%)          +  1.2% ±  0.4%
```
@ianprime0509 ianprime0509 merged commit 56c4de2 into main Oct 15, 2023
@ianprime0509 ianprime0509 deleted the perf/scanner branch October 15, 2023 22:36
ianprime0509 added a commit that referenced this pull request Oct 16, 2023
These optimizations are analogous to those made in #26 for `Scanner`.
The performance improvement for `Reader` is less significant, but still
noticeable:

```
Benchmark 1 (48 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           106ms ± 1.72ms     104ms …  110ms          0 ( 0%)        0%
  peak_rss           7.28MB ± 80.9KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          420M  ± 6.94M      413M  …  438M           0 ( 0%)        0%
  instructions       1.13G  ± 19.3      1.13G  … 1.13G           0 ( 0%)        0%
  cache_references    561K  ± 72.6K      477K  …  878K           2 ( 4%)        0%
  cache_misses       12.7K  ± 1.31K     11.3K  … 18.3K           3 ( 6%)        0%
  branch_misses      1.07M  ± 5.21K     1.07M  … 1.09M           0 ( 0%)        0%
Benchmark 2 (50 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           101ms ± 1.08ms    98.9ms …  103ms          1 ( 2%)        ⚡-  4.9% ±  0.5%
  peak_rss           7.29MB ± 63.6KB    7.21MB … 7.34MB          0 ( 0%)          +  0.2% ±  0.4%
  cpu_cycles          398M  ± 4.24M      393M  …  410M           1 ( 2%)        ⚡-  5.2% ±  0.5%
  instructions       1.11G  ± 20.8      1.11G  … 1.11G           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    404K  ±  244K      314K  … 1.82M           4 ( 8%)        ⚡- 28.0% ± 13.0%
  cache_misses       12.0K  ± 1.11K     10.1K  … 15.3K           0 ( 0%)        ⚡-  5.6% ±  3.8%
  branch_misses      1.10M  ± 2.89K     1.10M  … 1.12M           4 ( 8%)        💩+  2.8% ±  0.2%
```
ianprime0509 added a commit that referenced this pull request Oct 16, 2023
These optimizations are analogous to those made in #26 for `Scanner`.
The performance improvement for `Reader` is less significant, but still
noticeable:

```
Benchmark 1 (48 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           106ms ± 1.72ms     104ms …  110ms          0 ( 0%)        0%
  peak_rss           7.28MB ± 80.9KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          420M  ± 6.94M      413M  …  438M           0 ( 0%)        0%
  instructions       1.13G  ± 19.3      1.13G  … 1.13G           0 ( 0%)        0%
  cache_references    561K  ± 72.6K      477K  …  878K           2 ( 4%)        0%
  cache_misses       12.7K  ± 1.31K     11.3K  … 18.3K           3 ( 6%)        0%
  branch_misses      1.07M  ± 5.21K     1.07M  … 1.09M           0 ( 0%)        0%
Benchmark 2 (50 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           101ms ± 1.08ms    98.9ms …  103ms          1 ( 2%)        ⚡-  4.9% ±  0.5%
  peak_rss           7.29MB ± 63.6KB    7.21MB … 7.34MB          0 ( 0%)          +  0.2% ±  0.4%
  cpu_cycles          398M  ± 4.24M      393M  …  410M           1 ( 2%)        ⚡-  5.2% ±  0.5%
  instructions       1.11G  ± 20.8      1.11G  … 1.11G           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    404K  ±  244K      314K  … 1.82M           4 ( 8%)        ⚡- 28.0% ± 13.0%
  cache_misses       12.0K  ± 1.11K     10.1K  … 15.3K           0 ( 0%)        ⚡-  5.6% ±  3.8%
  branch_misses      1.10M  ± 2.89K     1.10M  … 1.12M           4 ( 8%)        💩+  2.8% ±  0.2%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant