add shard benchmark experiment #356

spencerschrock · 2025-02-25T23:23:39Z

Summary

The current default is 1MB, though previous benchmarks used GB in the past, so vary the shard size roughly in that range. Each benchmark is concerned with hash time, as well as manifest time.

In previous experimentation, the shard size does depend a bit on the model being analyzed, I picked two models, but it may be good to check with models that shard differently.

preliminary data says 2GiB is a good shard size

python3 benchmarks/exp_shard.py ~/models/falcon-7b
1048576:         9.9655  7787170
2097152:         5.8016  3895373
5242880:         3.3394  1560636
10485760:        2.5500   784869
20971520:        2.1676   393941
52428800:        1.9733   159605
104857600:       1.9009    81596
209715200:       1.8476    42876
524288000:       1.8293    18960
1073741824:      1.8951    11599
2147483648:      1.6447     7595
5368709120:      3.9575     4735
10737418240:     7.2927     4178

python3 benchmarks/exp_shard.py ~/models/gemma-7b/
1048576:        23.5757 13253524
2097152:        10.8698  6628968
5242880:         6.0967  2653668
10485760:        4.6714  1333666
20971520:        3.9581   668485
52428800:        3.6173   270200
104857600:       3.4757   137385
209715200:       3.3168    70477
524288000:       3.3041    30887
1073741824:      2.9686    16736
2147483648:      2.6510    10440
5368709120:      4.4327     6309
10737418240:     8.5381     5524

Release Note

NONE

Documentation

NONE

The current default is 1MB, though previous benchmarks used GB in the past, so vary the shard size roughly in that range. Each benchmark is concerned with hash time, as well as manifest time. Signed-off-by: Spencer Schrock <[email protected]>

spencerschrock · 2025-02-26T18:11:13Z

For large models, like Llama 405b (2.2TB, where the files are mainly 5GB ) where disk caching can't happen, the shard size has diminishing returns as the bottleneck will be disk speed.

python3 benchmarks/exp_shard.py ~/models/Llama-3.1-405B --repeat 3
1048576:         2043.0509   702872398   # roughly 700 MB
2097152:         2009.3873   351517924
5242880:         1959.6230   140667736
10485760:        1948.6399   70619115
20971520:        1944.6399   35359489
52428800:        1941.3997   14219131
104857600:       1941.7728   7184582
209715200:       1940.0365   3615453
524288000:       1939.8582   1512802
1073741824:      1940.0590   789471
2147483648:      1940.2088   468875      # roughly 0.5 MB
# still running

add shard benchmark experiment

8ca4e67

The current default is 1MB, though previous benchmarks used GB in the past, so vary the shard size roughly in that range. Each benchmark is concerned with hash time, as well as manifest time. Signed-off-by: Spencer Schrock <[email protected]>

spencerschrock requested review from a team as code owners February 25, 2025 23:23

mihaimaruseac approved these changes Feb 25, 2025

View reviewed changes

mihaimaruseac merged commit b2ddcc0 into sigstore:main Feb 25, 2025
33 checks passed

spencerschrock deleted the shard branch February 26, 2025 15:37

spencerschrock mentioned this pull request Feb 26, 2025

change default shard size to 1GB #357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add shard benchmark experiment #356

add shard benchmark experiment #356

spencerschrock commented Feb 25, 2025 •

edited

Loading

spencerschrock commented Feb 26, 2025

add shard benchmark experiment #356

add shard benchmark experiment #356

Conversation

spencerschrock commented Feb 25, 2025 • edited Loading

Summary

Release Note

Documentation

spencerschrock commented Feb 26, 2025

spencerschrock commented Feb 25, 2025 •

edited

Loading