-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build linux CUDA releases suitable for Colab & other platforms on 12.2 #11226
base: master
Are you sure you want to change the base?
Conversation
What's the reason for making a different release for each arch? |
@slaren Building for a single arch seems a lot faster, and having separate artefacts instead of (cuda-)fat binaries means smaller downloads / quicker setup on Colab. I couldn't finish a full build w/ all the architectures locally yet tho, maybe I'll try this to see how much overhead per arch we're talking about. |
Building specific archs separately to get maximum performance, smallest package size & shortest built times possible (compare a build for 7.5+8.0 vs. just 7.5 for instance:
libggml-cuda.so
is almost twice the size / ~70MB per arch)Colab one-liner (example usage to install CUDA llama.cpp (will need to adjust to github releases; will write an install script when releases are available):
TODO
ci
: ccache for all github worfklows #11516install.sh
(Unix incl. WSL) &install.ps1
(Windows) scripts that detect os, arch, cpu & gpu caps and install the right release (maybe through brew)