Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : automatic selection of best CPU backend #10606

Merged
merged 4 commits into from
Dec 1, 2024
Merged

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Nov 30, 2024

This the way it works:

  • Backends can export a function called ggml_backend_score
  • When loading a backend, all the available variants are checked and the highest score one is loaded
  • A score of 0 means that the backend cannot be used in the current system
  • The available variants are discovered automatically based on the file name, for example, when loading the CPU backend, all files that match libggml-cpu-*.so (or ggml-cpu-*.dll on windows) are checked.

The CPU backend implements this functionality for x86-64 and returns a score depending on the features included in the build that are supported on the running system.

The llama-server docker image has been updated to include variants for AVX, AVX2, AVX512 and AMX.

Caveat: the AVX and AVX2 variants still require FMA and F16C, which will limit the number of processors supported. More variants may be needed to fully support some microarchitectures.

@github-actions github-actions bot added script Script related devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Nov 30, 2024
@slaren slaren force-pushed the sl/dl-backend-4 branch 2 times, most recently from ea35fd8 to dadab7c Compare November 30, 2024 20:04
@github-actions github-actions bot added the build Compilation issues label Dec 1, 2024
@slaren slaren merged commit 3420909 into master Dec 1, 2024
50 checks passed
@slaren slaren deleted the sl/dl-backend-4 branch December 1, 2024 15:12
@giladgd
Copy link
Contributor

giladgd commented Dec 1, 2024

I think it may be worth having a cmake flag to build all the common CPU backend variations in a single build rather than having to build multiple times and combining the backend libraries.
Having this would make it easier to maintain a centralized list of the common configurations that would be supported by projects that use llama.cpp.

@slaren
Copy link
Collaborator Author

slaren commented Dec 2, 2024

Yes, I agree that would be better. TBH selecting the variants is a headache because there are so many options and each microarchitecture supports a different subset of them, so I didn't want to think too much about it. It might make more sense to build a variant for each microarchitecture, but there is going to be a lot of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning script Script related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants