sync : llama.cpp #1113

ggerganov · 2025-02-12T19:47:43Z

No description provided.

Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <[email protected]>

* CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <[email protected]>

cont #11659 ggml-ci

… (llama/11690) Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9

…a/11551)

* vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes

SYCL does not support non contiguous tensors for norm operations

* ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch

After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.

…lama/11502)

…VRAM allocation (llama/11592)

Co-authored-by: Jeff Bolz <[email protected]>

* Update ggml.c * Update arg.cpp * Update speculative.h

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <[email protected]>

… (llama/11803) * Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx * Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string

Signed-off-by: Weizhao Ouyang <[email protected]>

* Bug fix for clamp_f32 When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0. * Bug fix for clamp_f32 * Bug fix for clamp_f32

ggml-ci

jhen0409 and others added 28 commits February 12, 2025 21:46

metal : use residency set for other platforms (llama/11648)

0752eaf

HIP: force max threads per block to be 1024 (llama/11621)

4f37b52

Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <[email protected]>

CUDA: non-contiguous (RMS) norm support (llama/11659)

387edc9

* CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <[email protected]>

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

a50d222

metal : adjust support conditions for norm operators (llama/11671)

e841c6f

cont #11659 ggml-ci

metal : avoid breaking build when metal API predates TARGET_OS_VISION…

85d428d

… (llama/11690) Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9

vulkan: use smaller combined allocations to avoid fragmentation (llam…

6cd96d9

…a/11551)

vulkan: initial support for IQ4_XS quantization (llama/11501)

8444353

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)

b23fc86

* vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

12a4046

SYCL: Adjust support condition for norm operators (llama/11674)

bd19b23

SYCL does not support non contiguous tensors for norm operations

ggml : optimize and build warning fix for LoongArch (llama/11709)

2bd5ccb

* ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch

SYCL: remove XMX info from print devices (llama/11712)

98b3824

vulkan: print shared memory size (llama/11719)

e9961a7

CUDA: fix min. version for movmatrix (llama/11751)

fdf1349

vulkan: account for lookup tables when checking shared memory size (l…

1b3c1d2

…lama/11502)

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

e9afd20

…VRAM allocation (llama/11592)

vulkan: Make Vulkan optional at runtime (#11493). (llama/11494)

453aaee

Co-authored-by: Jeff Bolz <[email protected]>

fix: typos in documentation files (llama/11791)

04b22f0

* Update ggml.c * Update arg.cpp * Update speculative.h

CUDA: use arch list for compatibility check (llama/11775)

7f8cf75

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <[email protected]>

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx…

4b00e83

… (llama/11803) * Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx * Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string

CUDA: fix CUDART_VERSION checks (llama/11821)

501b77b

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)

0b897da

Signed-off-by: Weizhao Ouyang <[email protected]>

ggml : fix multi-threaded clamp_f32 (llama/11824)

b669f7a

* Bug fix for clamp_f32 When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0. * Bug fix for clamp_f32 * Bug fix for clamp_f32

cleanup: fix compile warnings associated with gnu_printf (llama/11811)

d893024

HIP: Switch to std::vector in rocblas version check (llama/11820)

cb120dc

sync : llama.cpp

93ceeb8

ggml-ci

ggerganov force-pushed the sync-llama.cpp-25-02-12 branch from de7ea5c to 93ceeb8 Compare February 12, 2025 19:51

ggerganov merged commit 9a4acb3 into master Feb 12, 2025
8 checks passed

ggerganov deleted the sync-llama.cpp-25-02-12 branch February 12, 2025 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1113

sync : llama.cpp #1113

ggerganov commented Feb 12, 2025

sync : llama.cpp #1113

sync : llama.cpp #1113

Conversation

ggerganov commented Feb 12, 2025