Replies: 1 comment
-
The default |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are running (and publishing with open licenses) a bunch of different benchmarks (e.g. memory bandwidth, OpenSSL or compression speed, redis/static web server cases, geekbench and passmark etc) on 2000+ cloud server types at sparecores.com, and currently working on a new set of benchmarks to be run on all servers for LLM inference speed using tiny, medium-sized and larger models as well with various configs.
llama-bench
has been a great tool in our initial tests (working with both CPUs and GPUs), but we run into issues when trying to benchmark machines with multiple GPUs: it did not scale at all, only one GPU was used in the tests (or sometimes multiple GPUs at fractional loads and with very similar score to using a single GPU).Can someone help us understand what limit we are facing here? Ideally, we need a tool that we can run on CPUs, a single GPU, or many GPUs and have a global score for the token/sec for each model/config on each server -- hopefully without much tweaking, and working even with small models so that we can run this on tiny machines as well.
What we have tried:
t
,ngl
,sm
,ts
CLI paramsExample run on a
g5.12x
@ AWS:Any hints are appreciated 🙇
Beta Was this translation helpful? Give feedback.
All reactions