Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build linux CUDA releases suitable for Colab & other platforms on 12.2 #11226

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jan 14, 2025

  • Building specific archs separately to get maximum performance, smallest package size & shortest built times possible (compare a build for 7.5+8.0 vs. just 7.5 for instance: libggml-cuda.so is almost twice the size / ~70MB per arch)

  • Colab one-liner (example usage to install CUDA llama.cpp (will need to adjust to github releases; will write an install script when releases are available):

    # Temporarily, hosting binaries on my own server
    !wget -O llama-cpp.zip "https://download.ochafik.com/llama.cpp/llama-cpp-master-cuda-$( nvidia-smi | grep "CUDA Version: " | sed -E 's/.*?Version: ([0-9]+\.[0-9]+).*/\1/' )-cap-$( nvidia-smi --query-gpu=compute_cap --format=csv | tail -n 1 ).zip" && unzip -o llama-cpp.zip
    
    # Once this PR gets merged
    !wget -O llama-cpp.zip "$( curl --silent "https://api.github.com/repos/ggerganov/llama.cpp/releases/latest" | grep cuda-cu$( nvidia-smi | grep "CUDA Version: " | sed -E 's/.*?Version: ([0-9]+\.[0-9]+).*/\1/' )-cap$( nvidia-smi --query-gpu=compute_cap --format=csv | tail -n 1 ) | grep browser_download_url | sed -E 's/.*(https:.*)"/\1/' )"
    

TODO

  • Merge ci: ccache for all github worfklows #11516
  • Fix build on ci
  • Compare benefit of separate archives for server / cli vs. full?
  • Trigger a branch release if possible (to test entire mechanics)
  • Incubate install.sh (Unix incl. WSL) & install.ps1 (Windows) scripts that detect os, arch, cpu & gpu caps and install the right release (maybe through brew)

@github-actions github-actions bot added devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Jan 14, 2025
@slaren
Copy link
Collaborator

slaren commented Jan 14, 2025

What's the reason for making a different release for each arch?

@ochafik
Copy link
Collaborator Author

ochafik commented Jan 14, 2025

What's the reason for making a different release for each arch?

@slaren Building for a single arch seems a lot faster, and having separate artefacts instead of (cuda-)fat binaries means smaller downloads / quicker setup on Colab. I couldn't finish a full build w/ all the architectures locally yet tho, maybe I'll try this to see how much overhead per arch we're talking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants