Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add shard benchmark experiment #356

Merged
merged 1 commit into from
Feb 25, 2025
Merged

Conversation

spencerschrock
Copy link
Contributor

@spencerschrock spencerschrock commented Feb 25, 2025

Summary

The current default is 1MB, though previous benchmarks used GB in the past, so vary the shard size roughly in that range. Each benchmark is concerned with hash time, as well as manifest time.

In previous experimentation, the shard size does depend a bit on the model being analyzed, I picked two models, but it may be good to check with models that shard differently.

preliminary data says 2GiB is a good shard size

python3 benchmarks/exp_shard.py ~/models/falcon-7b
1048576:         9.9655  7787170
2097152:         5.8016  3895373
5242880:         3.3394  1560636
10485760:        2.5500   784869
20971520:        2.1676   393941
52428800:        1.9733   159605
104857600:       1.9009    81596
209715200:       1.8476    42876
524288000:       1.8293    18960
1073741824:      1.8951    11599
2147483648:      1.6447     7595
5368709120:      3.9575     4735
10737418240:     7.2927     4178
python3 benchmarks/exp_shard.py ~/models/gemma-7b/
1048576:        23.5757 13253524
2097152:        10.8698  6628968
5242880:         6.0967  2653668
10485760:        4.6714  1333666
20971520:        3.9581   668485
52428800:        3.6173   270200
104857600:       3.4757   137385
209715200:       3.3168    70477
524288000:       3.3041    30887
1073741824:      2.9686    16736
2147483648:      2.6510    10440
5368709120:      4.4327     6309
10737418240:     8.5381     5524

Release Note

NONE

Documentation

NONE

The current default is 1MB, though previous benchmarks used GB in the
past, so vary the shard size roughly in that range. Each benchmark is
concerned with hash time, as well as manifest time.

Signed-off-by: Spencer Schrock <[email protected]>
@spencerschrock spencerschrock requested review from a team as code owners February 25, 2025 23:23
@mihaimaruseac mihaimaruseac merged commit b2ddcc0 into sigstore:main Feb 25, 2025
33 checks passed
@spencerschrock spencerschrock deleted the shard branch February 26, 2025 15:37
@spencerschrock
Copy link
Contributor Author

For large models, like Llama 405b (2.2TB, where the files are mainly 5GB ) where disk caching can't happen, the shard size has diminishing returns as the bottleneck will be disk speed.

python3 benchmarks/exp_shard.py ~/models/Llama-3.1-405B --repeat 3
1048576:         2043.0509   702872398   # roughly 700 MB
2097152:         2009.3873   351517924
5242880:         1959.6230   140667736
10485760:        1948.6399   70619115
20971520:        1944.6399   35359489
52428800:        1941.3997   14219131
104857600:       1941.7728   7184582
209715200:       1940.0365   3615453
524288000:       1939.8582   1512802
1073741824:      1940.0590   789471
2147483648:      1940.2088   468875      # roughly 0.5 MB
# still running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants