Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : llama.cpp #1039

Merged
merged 52 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
2dde9cc
add cmake rvv support (llama/10411)
lhpqaq Nov 19, 2024
01d961f
vulkan: further optimize mul_mat_vec using larger loads (llama/10387)
jeffbolznv Nov 20, 2024
a37e9dc
vulkan: copy iq4_nl LUT into shared memory (llama/10409)
jeffbolznv Nov 20, 2024
826679f
vulkan: predicate max operation in soft_max shaders/soft_max (llama/1…
jeffbolznv Nov 20, 2024
1dada6d
cuda : optimize argmax (llama/10441)
slaren Nov 21, 2024
1a9b04c
CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216)
leo-pony Nov 22, 2024
0e4f353
ggml : do not use ARM features not included in the build (llama/10457)
slaren Nov 23, 2024
27b3394
metal : minor code formatting
ggerganov Nov 25, 2024
f21071b
tests : fix compile warning
ggerganov Nov 25, 2024
3f44610
ggml : add support for dynamic loading of backends (llama/10469)
slaren Nov 25, 2024
0df7ac3
llama : accept a list of devices to use to offload a model (llama/10497)
slaren Nov 25, 2024
41b84d7
metal : enable mat-vec kernels for bs <= 4 (llama/10491)
ggerganov Nov 25, 2024
93a0782
vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484)
sparkleholic Nov 26, 2024
d3ccd11
CANN: RoPE and CANCAT operator optimization (llama/10488)
noemotiovon Nov 26, 2024
8c59e6b
CANN: Improve the Inferencing Performance for Ascend NPU Device (llam…
shen-shanshan Nov 26, 2024
d1976ec
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487)
chaxu01 Nov 26, 2024
a1d5c10
cmake : enable warnings in llama (llama/10474)
ggerganov Nov 26, 2024
965432c
vulkan: fix group_norm (llama/10496)
jeffbolznv Nov 26, 2024
76301a7
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (…
yeahdongcn Nov 26, 2024
2480d18
vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459)
jeffbolznv Nov 27, 2024
5ff537f
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/1…
jeffbolznv Nov 27, 2024
02dfbe1
vulkan: further optimize q5_k mul_mat_vec (llama/10479)
jeffbolznv Nov 27, 2024
40699b9
vulkan: Handle GPUs with less shared memory (llama/10468)
jeffbolznv Nov 27, 2024
b314f1f
vulkan: define all quant data structures in types.comp (llama/10440)
jeffbolznv Nov 27, 2024
24bc6e0
metal : fix group_norm support condition (llama/0)
ggerganov Nov 27, 2024
3602400
Add some minimal optimizations for CDNA (llama/10498)
IMbackK Nov 27, 2024
ac2fc33
CANN: ROPE operator optimization (llama/10540)
noemotiovon Nov 28, 2024
abcb176
CANN: Fix SOC_TYPE compile bug (llama/10519)
leo-pony Nov 28, 2024
a6b17f2
kompute : improve backend to pass test_backend_ops (llama/10542)
slp Nov 28, 2024
f7ffcd2
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
FanShupei Nov 28, 2024
4def5cb
cmake : fix ARM feature detection (llama/10543)
ggerganov Nov 28, 2024
5848c4c
ggml : fix row condition for i8mm kernels (llama/10561)
ggerganov Nov 28, 2024
816485a
ggml : remove redundant copyright notice + update authors
ggerganov Nov 28, 2024
6eff9fb
vulkan: get the first command buffer submitted sooner (llama/10499)
jeffbolznv Nov 29, 2024
5a8c20b
CANN: RoPE operator optimization (llama/10563)
noemotiovon Nov 29, 2024
c17ffb7
sycl : Reroute permuted mul_mats through oneMKL (llama/10408)
Alcpz Nov 29, 2024
f6e4e07
sycl : offload of get_rows set to 0 (llama/10432)
Alcpz Nov 29, 2024
1602102
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)
FanShupei Nov 29, 2024
d2896d6
ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562)
ggerganov Nov 29, 2024
cf66ee0
vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536)
netrunnereve Nov 30, 2024
a5c2af2
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
angt Nov 30, 2024
1bc5829
SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579)
qnixsynapse Dec 2, 2024
c908240
metal : small-batch mat-mul kernels (llama/10581)
ggerganov Dec 3, 2024
8ce7ebf
scripts : remove amx from sync
ggerganov Dec 3, 2024
a842931
ggml : move AMX to the CPU backend (llama/10570)
slaren Dec 3, 2024
6aa7be5
sync : llama.cpp
ggerganov Dec 3, 2024
23caf78
authors : update
ggerganov Dec 3, 2024
7756deb
common : fix compile warning
ggerganov Dec 3, 2024
9666d68
files : remove make artifacts
ggerganov Dec 3, 2024
0c55ec2
ci : fix pip env
ggerganov Dec 3, 2024
65f6241
ci : remove opencl workflow
ggerganov Dec 3, 2024
d54f335
ci : update requirements.txt
ggerganov Dec 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,6 @@ on:
branches: [ master ]

jobs:
test-ubuntu-opencl:
if: false
runs-on: ubuntu-latest
env:
GGML_NLOOP: 3
GGML_NITER: 1
GGML_N_THREADS: 2

steps:
- uses: actions/checkout@v3

- name: Dependencies
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update
sudo apt-get install -y --no-install-recommends llvm intel-oneapi-runtime-opencl intel-oneapi-runtime-compilers libclblast-dev
- name: Create Build Environment
run: mkdir build

- name: Configure CMake
working-directory: ./build
run: cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DGGML_CLBLAST=ON ..

- name: Build
working-directory: ./build
run: make

- name: Test
working-directory: ./build
run: ctest --verbose --timeout 900

test-macos-metal:
runs-on: macos-13
env:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ zig-out/
zig-cache/

*.o
*.d
*.dot

*.sw?
Expand Down
38 changes: 37 additions & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# date: Thu Sep 26 09:19:50 CDT 2024
# date: Tue Dec 3 20:24:37 EET 2024
# this file is auto-generated by scripts/gen-authors.sh

0cc4m <[email protected]>
65a <[email protected]>
AT <[email protected]>
Abhilash Majumder <[email protected]>
Adam Tazi <[email protected]>
Ahmad Tameem <[email protected]>
AidanBeltonS <[email protected]>
AidanBeltonS <[email protected]>
Akarshan Biswas <[email protected]>
Akarshan Biswas <[email protected]>
Albert Jin <[email protected]>
Alberto Cabrera Pérez <[email protected]>
Alberto Cabrera Pérez <[email protected]>
Expand All @@ -20,6 +22,7 @@ AmirAli Mirian <[email protected]>
Ananta Bastola <[email protected]>
Andreas (Andi) Kunar <[email protected]>
Andrei <[email protected]>
Andrew Minh Nguyen <[email protected]>
Arjun <[email protected]>
Ashraful Islam <[email protected]>
Astariul <[email protected]>
Expand All @@ -35,6 +38,9 @@ Bryan Lozano <[email protected]>
Carolinabanana <[email protected]>
CarterLi999 <[email protected]>
Cebtenzzre <[email protected]>
Changyeon Kim <[email protected]>
Charles Xu <[email protected]>
Charles Xu <[email protected]>
Chen Xi <[email protected]>
Chen Xi <[email protected]>
Chris Elrod <[email protected]>
Expand All @@ -44,6 +50,8 @@ Cordeiro <[email protected]>
Cristiano Calcagno <[email protected]>
DAN™ <[email protected]>
Dan Forbes <[email protected]>
Dan Johansson <[email protected]>
Dan Johansson <[email protected]>
Daniel Bevenius <[email protected]>
Daniel Ziegenberg <[email protected]>
Daniele <[email protected]>
Expand All @@ -56,13 +64,17 @@ DavidKorczynski <[email protected]>
Davidson Francis <[email protected]>
Dibakar Gope <[email protected]>
Didzis Gosko <[email protected]>
Diego Devesa <[email protected]>
Diogo <[email protected]>
Djip007 <[email protected]>
Dou Xinpeng <[email protected]>
Dou Xinpeng <[email protected]>
Dr. Tom Murphy VII Ph.D <[email protected]>
Ebey Abraham <[email protected]>
Eldar Yusupov <[email protected]>
Emmanuel Durand <[email protected]>
Engininja2 <[email protected]>
Eric Zhang <[email protected]>
Erik Scholz <[email protected]>
Ettore Di Giacinto <[email protected]>
Eve <[email protected]>
Expand All @@ -71,9 +83,12 @@ Faisal Zaghloul <[email protected]>
FantasyGmm <[email protected]>
Felix <[email protected]>
Finn Voorhees <[email protected]>
FirstTimeEZ <[email protected]>
Frankie Robertson <[email protected]>
GainLee <[email protected]>
George Hindle <[email protected]>
Georgi Gerganov <[email protected]>
Gilad S <[email protected]>
Gilad S <[email protected]>
Guillaume Wenzek <[email protected]>
Halalaluyafail3 <[email protected]>
Expand All @@ -85,6 +100,7 @@ Hyunsung Lee <[email protected]>
IGUILIZ Salah-Eddine <[email protected]>
Ian Bull <[email protected]>
Ikko Eltociear Ashimine <[email protected]>
Ivan <[email protected]>
Ivan Filipov <[email protected]>
Ivan Stepanov <[email protected]>
Ivan Zdane <[email protected]>
Expand All @@ -106,6 +122,7 @@ Johannes Gäßler <[email protected]>
John Balis <[email protected]>
Josh Bleecher Snyder <[email protected]>
Judd <[email protected]>
Jun Hee Yoo <[email protected]>
Justina Cho <[email protected]>
Justine Tunney <[email protected]>
Justine Tunney <[email protected]>
Expand All @@ -117,7 +134,9 @@ LoganDark <[email protected]>
LoganDark <[email protected]>
LostRuins <[email protected]>
Lukas Möller <[email protected]>
M Refi D.A <[email protected]>
M. Yusuf Sarıgöz <[email protected]>
Ma Mingfei <[email protected]>
Mahesh Madhav <[email protected]>
MaiHD <[email protected]>
Mark Zhuang <[email protected]>
Expand All @@ -126,6 +145,7 @@ Masaya, Kato <[email protected]>
Mathijs de Bruin <[email protected]>
Matt Stephenson <[email protected]>
Max Krasnyansky <[email protected]>
Max Krasnyansky <[email protected]>
Mayank Kumar Pal <[email protected]>
Meng, Hengyu <[email protected]>
Mengqing Cao <[email protected]>
Expand All @@ -150,7 +170,9 @@ PAB <[email protected]>
Paul Tsochantaris <[email protected]>
Philpax <[email protected]>
Pierre Alexandre SCHEMBRI <[email protected]>
Plamen Minev <[email protected]>
Playdev <[email protected]>
Prashant Vithule <[email protected]>
Przemysław Pawełczyk <[email protected]>
R0CKSTAR <[email protected]>
R0CKSTAR <[email protected]>
Expand All @@ -162,15 +184,20 @@ Reinforce-II <[email protected]>
Reza Rezvan <[email protected]>
Rick G <[email protected]>
RiverZhou <[email protected]>
Romain Biessy <[email protected]>
Ronsor <[email protected]>
Rotem Dan <[email protected]>
Ryan Hitchman <[email protected]>
SRHMorris <[email protected]>
SXX <[email protected]>
Salvatore Mesoraca <[email protected]>
Sam Spilsbury <[email protected]>
Sanchit Gandhi <[email protected]>
Santtu Keskinen <[email protected]>
Sergio López <[email protected]>
Sergio López <[email protected]>
Shijie <[email protected]>
Shupei Fan <[email protected]>
Siddharth Ramakrishnan <[email protected]>
Sigbjørn Skjæret <[email protected]>
Skyler Celestinian-Sterling <[email protected]>
Expand All @@ -186,18 +213,24 @@ Timothy Cronin <[email protected]>
Tom Bailey <[email protected]>
Tom Jobbins <[email protected]>
Tony Wasserka <[email protected]>
Tristan Druyen <[email protected]>
Tyé singwa <[email protected]>
UEXTM.com <[email protected]>
WillCorticesAI <[email protected]>
William Tambellini <[email protected]>
William Tambellini <[email protected]>
XiaotaoChen <[email protected]>
Xinpeng Dou <[email protected]>
Xuan Son Nguyen <[email protected]>
Yavor Ivanov <[email protected]>
YavorGIvanov <[email protected]>
Yilong Guo <[email protected]>
Yilong Guo <[email protected]>
Yuri Khrustalev <[email protected]>
Zhenwei Jin <[email protected]>
Zhiyuan Li <[email protected]>
agray3 <[email protected]>
amritahs-ibm <[email protected]>
apcameron <[email protected]>
appvoid <[email protected]>
ariez-xyz <[email protected]>
Expand Down Expand Up @@ -232,9 +265,11 @@ l3utterfly <[email protected]>
le.chang <[email protected]>
leejet <[email protected]>
leejet <[email protected]>
leo-pony <[email protected]>
liuwei-git <[email protected]>
luoyu-intel <[email protected]>
magicse <[email protected]>
mahorozte <[email protected]>
mashizora <[email protected]>
matteo <[email protected]>
ochafik <[email protected]>
Expand All @@ -254,6 +289,7 @@ ucag.li <[email protected]>
ulatekh <[email protected]>
wangshuai09 <[email protected]>
woachk <[email protected]>
xctan <[email protected]>
yangyaofei <[email protected]>
yuri@FreeBSD <yuri@FreeBSD>
zhentaoyu <[email protected]>
Expand Down
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ else()
endif()

option(BUILD_SHARED_LIBS "ggml: build shared libraries" ${BUILD_SHARED_LIBS_DEFAULT})
option(GGML_BACKEND_DL "ggml: build backends as dynamic libraries (requires BUILD_SHARED_LIBS)" OFF)

#
# option list
Expand Down Expand Up @@ -95,6 +96,7 @@ option(GGML_CPU_HBM "ggml: use memkind for CPU HBM" OFF)
option(GGML_CPU_AARCH64 "ggml: use runtime weight conversion of Q4_0 to Q4_X_X" ON)

option(GGML_AVX "ggml: enable AVX" ${INS_ENB})
option(GGML_AVX_VNNI "ggml: enable AVX-VNNI" OFF)
option(GGML_AVX2 "ggml: enable AVX2" ${INS_ENB})
option(GGML_AVX512 "ggml: enable AVX512" OFF)
option(GGML_AVX512_VBMI "ggml: enable AVX512-VBMI" OFF)
Expand All @@ -109,6 +111,7 @@ if (NOT MSVC)
endif()
option(GGML_LASX "ggml: enable lasx" ON)
option(GGML_LSX "ggml: enable lsx" ON)
option(GGML_RVV "ggml: enable rvv" ON)
option(GGML_SVE "ggml: enable SVE" OFF)

if (WIN32)
Expand Down Expand Up @@ -159,7 +162,6 @@ set (GGML_METAL_MACOSX_VERSION_MIN "" CACHE STRING
set (GGML_METAL_STD "" CACHE STRING "ggml: metal standard version (-std flag)")
option(GGML_OPENMP "ggml: use OpenMP" ON)
option(GGML_RPC "ggml: use RPC" OFF)
option(GGML_AMX "ggml: use AMX" OFF)
option(GGML_SYCL "ggml: use SYCL" OFF)
option(GGML_SYCL_F16 "ggml: use 16 bit floats for sycl calculations" OFF)
set (GGML_SYCL_TARGET "INTEL" CACHE STRING
Expand Down
14 changes: 11 additions & 3 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -293,15 +293,23 @@ function gg_sum_yolo {

## main

if [ -z $GG_BUILD_LOW_PERF ]; then
if [ -z ${GG_BUILD_LOW_PERF} ]; then
# Create symlink: ./ggml/models-mnt -> $MNT/models/models-mnt
rm -rf ${SRC}/models-mnt

mnt_models=${MNT}/models
mkdir -p ${mnt_models}
ln -sfn ${mnt_models} ${SRC}/models-mnt

# Create a fresh python3 venv and enter it
if ! python3 -m venv "$MNT/venv"; then
echo "Error: Failed to create Python virtual environment at $MNT/venv."
exit 1
fi
source "$MNT/venv/bin/activate"

pip install -r ${SRC}/requirements.txt --disable-pip-version-check
fi

python3 -m pip install -r ${SRC}/requirements.txt

ret=0

Expand Down
1 change: 1 addition & 0 deletions examples/common-ggml.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ bool ggml_common_quantize_0(
case GGML_TYPE_Q4_0_8_8:
case GGML_TYPE_TQ1_0:
case GGML_TYPE_TQ2_0:
case GGML_TYPE_IQ4_NL_4_4:
case GGML_TYPE_COUNT:
{
fprintf(stderr, "%s: unsupported quantization type %d (%s)\n", __func__, ttype, ggml_type_name((ggml_type) ttype));
Expand Down
25 changes: 0 additions & 25 deletions include/ggml-amx.h

This file was deleted.

15 changes: 15 additions & 0 deletions include/ggml-backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,14 @@ extern "C" {
typedef void (*ggml_backend_set_n_threads_t)(ggml_backend_t backend, int n_threads);
// Get additional buffer types provided by the device (returns a NULL-terminated array)
typedef ggml_backend_buffer_type_t * (*ggml_backend_dev_get_extra_bufts_t)(ggml_backend_dev_t device);
// Set the abort callback for the backend
typedef void (*ggml_backend_set_abort_callback_t)(ggml_backend_t backend, ggml_abort_callback abort_callback, void * abort_callback_data);
// Get a list of feature flags supported by the backend (returns a NULL-terminated array)
struct ggml_backend_feature {
const char * name;
const char * value;
};
typedef struct ggml_backend_feature * (*ggml_backend_get_features_t)(ggml_backend_reg_t reg);

//
// Backend registry
Expand All @@ -214,6 +222,13 @@ extern "C" {
// = ggml_backend_dev_init(ggml_backend_dev_by_type(GPU) OR ggml_backend_dev_by_type(CPU), NULL)
GGML_API ggml_backend_t ggml_backend_init_best(void);

// Load a backend from a dynamic library and register it
GGML_API ggml_backend_reg_t ggml_backend_load(const char * path);
// Unload a backend if loaded dynamically and unregister it
GGML_API void ggml_backend_unload(ggml_backend_reg_t reg);
// Load all known backends from dynamic libraries
GGML_API void ggml_backend_load_all(void);

//
// Backend scheduler
//
Expand Down
Loading
Loading