Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does this compare to CoreWeave's Tensorizer? #15

Open
alpayariyak opened this issue Nov 5, 2024 · 1 comment
Open

How does this compare to CoreWeave's Tensorizer? #15

alpayariyak opened this issue Nov 5, 2024 · 1 comment

Comments

@alpayariyak
Copy link

(In both local and cloud storage settings)

Also, for better distribution you might want to integrate with vLLM, similarly to how tensorizer is supported there

@noa-neria
Copy link
Collaborator

noa-neria commented Nov 5, 2024

We performed benchmarks of runai-model-streamer, CoreWeave's Tensorizer and Safetensors library, and published the results here

A notable advantage of runai-model-streamer is that its concurrency level is not limited by the size of the largest tensor of the model, unlike Tensorizer. The storage layer in our streamer is independent of the tensor sizes and can be configured to work at any concurrency level which is optimal for the storage type. It is especially important in order to saturate the read bandwidth of the storage. This is the reason for the high performance in our benchmarks when reading from distributed storage such as S3, which was faster than Tensorizer by a factor of x7.6

Thank you for pointing out integration to vLLM, and in fact there is an open pull request for that in the vLLM repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants