Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : llama.cpp #1113

Merged
merged 28 commits into from
Feb 12, 2025
Merged

sync : llama.cpp #1113

merged 28 commits into from
Feb 12, 2025

Conversation

ggerganov
Copy link
Member

No description provided.

jhen0409 and others added 28 commits February 12, 2025 21:46
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <[email protected]>
* CUDA: non-contiguous (RMS) norm support

---------

Co-authored-by: Georgi Gerganov <[email protected]>
… (llama/11690)

Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
SYCL does not support non contiguous tensors for norm operations
* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch
After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.

Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function.
* Update ggml.c

* Update arg.cpp

* Update speculative.h
* CUDA: use arch list for feature availability check

---------

Co-authored-by: Diego Devesa <[email protected]>
… (llama/11803)

* Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx

* Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string
* Bug fix for clamp_f32

When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.

* Bug fix for clamp_f32

* Bug fix for clamp_f32
@ggerganov ggerganov force-pushed the sync-llama.cpp-25-02-12 branch from de7ea5c to 93ceeb8 Compare February 12, 2025 19:51
@ggerganov ggerganov merged commit 9a4acb3 into master Feb 12, 2025
8 checks passed
@ggerganov ggerganov deleted the sync-llama.cpp-25-02-12 branch February 12, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.