sync : llama.cpp #997

ggerganov · 2024-10-23T14:27:51Z

No description provided.

…/9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment

* vulkan : add backend registry / device interfaces * llama : print devices used on model load

add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend

* implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp

* rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server

Co-authored-by: arthw <[email protected]>

ggml-ci

* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request

* add pool_2d Signed-off-by: Junhee Yoo <[email protected]> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <[email protected]> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <[email protected]> * remove trailing whitespaces Signed-off-by: Junhee Yoo <[email protected]> * apply suggestions Signed-off-by: Junhee Yoo <[email protected]> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <[email protected]> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <[email protected]> * apply review Signed-off-by: Junhee Yoo <[email protected]> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <[email protected]> --------- Signed-off-by: Junhee Yoo <[email protected]>

giladgd and others added 12 commits October 23, 2024 17:24

fix: allocating CPU buffer with size 0 (llama/9917)

f1fbe59

vulkan : add backend registry / device interfaces (llama/9721)

1b907ca

* vulkan : add backend registry / device interfaces * llama : print devices used on model load

Add SYCL Backend registry, device and Event Interfaces (llama/9705)

ba05a9f

* implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp

rpc : backend refactoring (llama/9912)

c3add3c

* rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server

fix mul_mat_vec_q and *_vec_q error (llama/9939)

31ff7b9

Co-authored-by: arthw <[email protected]>

rpc : pack only RPC structs (llama/9959)

bf83d52

ggml : add asserts for type conversion in fattn kernels (llama/9971)

c2e0485

ggml-ci

Adapt to dynamically loadable backends mechanism (llama/9970)

d95fd54

* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request

sync : llama.cpp

21432e3

ggerganov merged commit dcd9b4a into master Oct 23, 2024
4 checks passed

ggerganov deleted the sync branch October 23, 2024 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #997

sync : llama.cpp #997

ggerganov commented Oct 23, 2024

sync : llama.cpp #997

sync : llama.cpp #997

Conversation

ggerganov commented Oct 23, 2024