-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Track] DeepSeek V3/R1 accuracy #3486
Comments
some 8 * H20 accuracy for deepseek-v3, cc: @zhyncs Serverpython3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --mem-fraction-static 0.9 gsmk8python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319 Accuracy: 0.950
Invalid: 0.000
Latency: 236.747 s
Output throughput: 587.916 token/s mmlubash benchmark/mmlu/download_data.sh
python3 benchmark/mmlu/bench_sglang.py --nsub 100 --ntrain 5 --parallel 2000 subject: abstract_algebra, #q:100, acc: 0.820
subject: anatomy, #q:135, acc: 0.881
subject: astronomy, #q:152, acc: 0.934
subject: business_ethics, #q:100, acc: 0.870
subject: clinical_knowledge, #q:265, acc: 0.917
subject: college_biology, #q:144, acc: 0.965
subject: college_chemistry, #q:100, acc: 0.650
subject: college_computer_science, #q:100, acc: 0.830
subject: college_mathematics, #q:100, acc: 0.800
subject: college_medicine, #q:173, acc: 0.867
subject: college_physics, #q:102, acc: 0.814
subject: computer_security, #q:100, acc: 0.890
subject: conceptual_physics, #q:235, acc: 0.949
subject: econometrics, #q:114, acc: 0.807
subject: electrical_engineering, #q:145, acc: 0.876
subject: elementary_mathematics, #q:378, acc: 0.944
subject: formal_logic, #q:126, acc: 0.810
subject: global_facts, #q:100, acc: 0.730
subject: high_school_biology, #q:310, acc: 0.958
subject: high_school_chemistry, #q:203, acc: 0.897
subject: high_school_computer_science, #q:100, acc: 0.950
subject: high_school_european_history, #q:165, acc: 0.885
subject: high_school_geography, #q:198, acc: 0.960
subject: high_school_government_and_politics, #q:193, acc: 0.990
subject: high_school_macroeconomics, #q:390, acc: 0.931
subject: high_school_mathematics, #q:270, acc: 0.752
subject: high_school_microeconomics, #q:238, acc: 0.954
subject: high_school_physics, #q:151, acc: 0.834
subject: high_school_psychology, #q:545, acc: 0.961
subject: high_school_statistics, #q:216, acc: 0.861
subject: high_school_us_history, #q:204, acc: 0.961
subject: high_school_world_history, #q:237, acc: 0.949
subject: human_aging, #q:223, acc: 0.870
subject: human_sexuality, #q:131, acc: 0.924
subject: international_law, #q:121, acc: 0.975
subject: jurisprudence, #q:108, acc: 0.907
subject: logical_fallacies, #q:163, acc: 0.914
subject: machine_learning, #q:112, acc: 0.857
subject: management, #q:103, acc: 0.961
subject: marketing, #q:234, acc: 0.962
subject: medical_genetics, #q:100, acc: 0.960
subject: miscellaneous, #q:783, acc: 0.962
subject: moral_disputes, #q:346, acc: 0.864
subject: moral_scenarios, #q:895, acc: 0.806
subject: nutrition, #q:306, acc: 0.922
subject: philosophy, #q:311, acc: 0.929
subject: prehistory, #q:324, acc: 0.935
subject: professional_accounting, #q:282, acc: 0.869
subject: professional_law, #q:1534, acc: 0.720
subject: professional_medicine, #q:272, acc: 0.952
subject: professional_psychology, #q:612, acc: 0.907
subject: public_relations, #q:110, acc: 0.809
subject: security_studies, #q:245, acc: 0.869
subject: sociology, #q:201, acc: 0.945
subject: us_foreign_policy, #q:100, acc: 0.950
subject: virology, #q:166, acc: 0.578
subject: world_religions, #q:171, acc: 0.930
Total latency: 435.171
Average accuracy: 0.878 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
conclusion
gsm8k and mmlu are completely consistent with the official release
server
gsm8k
mmlu
The text was updated successfully, but these errors were encountered: