ggml : automatic selection of best CPU backend #10606

slaren · 2024-11-30T19:41:14Z

This the way it works:

Backends can export a function called ggml_backend_score
When loading a backend, all the available variants are checked and the highest score one is loaded
A score of 0 means that the backend cannot be used in the current system
The available variants are discovered automatically based on the file name, for example, when loading the CPU backend, all files that match libggml-cpu-*.so (or ggml-cpu-*.dll on windows) are checked.

The CPU backend implements this functionality for x86-64 and returns a score depending on the features included in the build that are supported on the running system.

The llama-server docker image has been updated to include variants for AVX, AVX2, AVX512 and AMX.

Caveat: the AVX and AVX2 variants still require FMA and F16C, which will limit the number of processors supported. More variants may be needed to fully support some microarchitectures.

ggml-ci

giladgd · 2024-12-01T23:24:13Z

I think it may be worth having a cmake flag to build all the common CPU backend variations in a single build rather than having to build multiple times and combining the backend libraries.
Having this would make it easier to maintain a centralized list of the common configurations that would be supported by projects that use llama.cpp.

slaren · 2024-12-02T01:10:53Z

Yes, I agree that would be better. TBH selecting the variants is a headache because there are so many options and each microarchitecture supports a different subset of them, so I didn't want to think too much about it. It might make more sense to build a variant for each microarchitecture, but there is going to be a lot of them.

github-actions bot added script Script related devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Nov 30, 2024

slaren force-pushed the sl/dl-backend-4 branch 2 times, most recently from ea35fd8 to dadab7c Compare November 30, 2024 20:04

ggml : automatic selection of best CPU backend

8bfef91

slaren force-pushed the sl/dl-backend-4 branch from dadab7c to 8bfef91 Compare November 30, 2024 20:20

ggerganov approved these changes Dec 1, 2024

View reviewed changes

amx : minor opt

b14b9bf

ggml-ci

slaren mentioned this pull request Dec 1, 2024

ggml : move AMX to the CPU backend #10570

Merged

add cpuid check for avx-vnni

6d78e0f

github-actions bot added the build Compilation issues label Dec 1, 2024

add GGML_AVX_VNNI to enable avx-vnni, fix checks

854eff8

slaren force-pushed the sl/dl-backend-4 branch from c6cfe31 to 854eff8 Compare December 1, 2024 14:51

slaren merged commit 3420909 into master Dec 1, 2024
50 checks passed

slaren deleted the sl/dl-backend-4 branch December 1, 2024 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : automatic selection of best CPU backend #10606

ggml : automatic selection of best CPU backend #10606

slaren commented Nov 30, 2024 •

edited

Loading

giladgd commented Dec 1, 2024

slaren commented Dec 2, 2024

ggml : automatic selection of best CPU backend #10606

ggml : automatic selection of best CPU backend #10606

Conversation

slaren commented Nov 30, 2024 • edited Loading

giladgd commented Dec 1, 2024

slaren commented Dec 2, 2024

slaren commented Nov 30, 2024 •

edited

Loading