You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We performed benchmarks of runai-model-streamer, CoreWeave's Tensorizer and Safetensors library, and published the results here
A notable advantage of runai-model-streamer is that its concurrency level is not limited by the size of the largest tensor of the model, unlike Tensorizer. The storage layer in our streamer is independent of the tensor sizes and can be configured to work at any concurrency level which is optimal for the storage type. It is especially important in order to saturate the read bandwidth of the storage. This is the reason for the high performance in our benchmarks when reading from distributed storage such as S3, which was faster than Tensorizer by a factor of x7.6
Thank you for pointing out integration to vLLM, and in fact there is an open pull request for that in the vLLM repository
(In both local and cloud storage settings)
Also, for better distribution you might want to integrate with vLLM, similarly to how tensorizer is supported there
The text was updated successfully, but these errors were encountered: