ggml-cpu: enable IBM NNPA Vector Intrinsics #14317

taronaeo · 2025-06-21T13:11:24Z

This pull request aims to enable the IBM NNPA instruction set for IBM z16 mainframes and later on the s390x platform. This code change is mainly targeted at FP16 -> FP32 or FP32 -> FP16 data conversions.

Note: This PR supersedes #14303 because that implementation was wrong.

Verification

To ensure that this implementation did not break anything, the NNPA instruction set has been tested on the following models:

Tested IBM Granite 3.3 (F32, F16, Q4_0, Q4_1, Q3_K, Q4_K, Q5_K)
Kindly request additional models for testing in this PR

Performance Results

I will be using IBM Granite 3.3 for the performance tests. We notice a performance improvement of roughly 0.70% for F16 Prompt Processing, and 29.23% for F16 Token Generation, which is the expected outcome.

Before NNPA Instruction Set

model	size	params	backend	threads	test	t/s
granite 3B all F32	9.44 GiB	2.53 B	BLAS	4	pp512	31.56 ± 0.23
granite 3B all F32	9.44 GiB	2.53 B	BLAS	4	tg128	1.75 ± 0.01
granite 3B F16	4.72 GiB	2.53 B	BLAS	4	pp512	30.94 ± 0.20
granite 3B F16	4.72 GiB	2.53 B	BLAS	4	tg128	1.46 ± 0.01

After NNPA Instruction Set

model	size	params	backend	threads	test	t/s
granite 3B all F32	9.44 GiB	2.53 B	BLAS	4	pp512	31.85 ± 0.10
granite 3B all F32	9.44 GiB	2.53 B	BLAS	4	tg128	1.70 ± 0.01
granite 3B F16	4.72 GiB	2.53 B	BLAS	4	pp512	31.16 ± 0.04
granite 3B F16	4.72 GiB	2.53 B	BLAS	4	tg128	1.96 ± 0.02

Note

Tests were conducted on an IBM z16 Mainframe with 2 IFLs (4 vCores) and 64 GB Memory on z/VM (Type-2)

ggml_compute_fp16_to_fp32 and ggml_compute_fp32_to_fp16 SIMD activations are ready. However, I was unable to find a way to make the s390x platform detection macros usable in ggml-impl.h, thus leaving the correct implementation inside first until we can correct it.

Edit 1: Note: This PR contains ggml-base and ggml-cpu refactor for FP16<->FP32 SIMD as requested in #14317 (comment).

Please review this pull request and consider merging into the main repository. Thank you!

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 4a9f60c)

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 8d4a798)

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 0ff0d65)

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 2f58bbc)

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 01b9294)

Signed-off-by: Aaron Teo <[email protected]>

for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

ggml/src/ggml-cpu/simd-mappings.h

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]>

fallback logic was already implemented but i was too sleepy to realise Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-06-24T18:35:15Z

Refactored ggml-cpu from GGML_FP16_TO_FP32 to GGML_CPU_FP16_TO_FP32 and GGML_FP32_TO_FP16 to GGML_CPU_FP32_TO_FP16 in the latest push :)

@slaren PTAL again

slaren · 2025-06-24T19:41:27Z

I believe these includes can be removed now, or moved to the CPU backend if necessary:

llama.cpp/ggml/src/ggml-impl.h

Lines 15 to 29 in 1b23fec

    
           #ifdef __ARM_FEATURE_SVE 
        
           #include <arm_sve.h> 
        
           #endif // __ARM_FEATURE_SVE 
        
           #if defined(__ARM_NEON) && !defined(__CUDACC__) && !defined(__MUSACC__) 
        
           // if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example: 
        
           // 
        
           //   $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/ 
        
           // 
        
           #include <arm_neon.h> 
        
           #endif 
        
           #if defined(__F16C__) 
        
           #include <immintrin.h> 
        
           #endif

slaren · 2025-06-24T19:49:49Z

ggml/src/ggml-cpu/simd-mappings.h

+inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
+    uint16_t s;
+    memcpy(&s, &f, sizeof(uint16_t));
+    return ggml_table_f32_f16[s];
+}


The lookup table ggml_table_f32_f16 is still in ggml-base, it should be moved to ggml-cpu as well, since it is only used in the CPU backend now. The initialization can be done in ggml_cpu_init.

The initialization can be done in ggml_cpu_init.

So move this line

llama.cpp/ggml/src/ggml.c

Line 1430 in 73e53dc

ggml_table_f32_f16[i] = GGML_COMPUTE_FP16_TO_FP32(u.fp16);

into this code block?

llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c

Lines 3437 to 3445 in 73e53dc

for (int i = 0; i < (1 << 16); ++i) {

union {

uint16_t u16;

ggml_fp16_t fp16;

} u = {i};

float f = GGML_FP16_TO_FP32(u.fp16);

ggml_table_gelu_f16[i] = GGML_FP32_TO_FP16(ggml_gelu_f32(f));

ggml_table_gelu_quick_f16[i] = GGML_FP32_TO_FP16(ggml_gelu_quick_f32(f));

}

I feel like I'm stepping into dangerous territory 😅

Fixed in latest push, but I'm unsure if there will be any problems arising out of my code change.

Edit: As expected, the following change had problems with Windows, Vulkan and Server builds. Will revert the patches until I can get a better direction for this 😔

Expand patch change

diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c index 70f32801..ce296898 100644 --- a/ggml/src/ggml-cpu/ggml-cpu.c +++ b/ggml/src/ggml-cpu/ggml-cpu.c @@ -3479,6 +3479,7 @@ void ggml_cpu_init(void) { ggml_fp16_t fp16; } u = {i}; float f = GGML_CPU_FP16_TO_FP32(u.fp16); + ggml_table_f32_f16[i] = GGML_COMPUTE_FP16_TO_FP32(u.fp16); ggml_table_gelu_f16[i] = GGML_CPU_FP32_TO_FP16(ggml_gelu_f32(f)); ggml_table_gelu_quick_f16[i] = GGML_CPU_FP32_TO_FP16(ggml_gelu_quick_f32(f)); } diff --git a/ggml/src/ggml-cpu/simd-mappings.h b/ggml/src/ggml-cpu/simd-mappings.h index 655ab3c6..2f65ccd1 100644 --- a/ggml/src/ggml-cpu/simd-mappings.h +++ b/ggml/src/ggml-cpu/simd-mappings.h @@ -137,6 +137,10 @@ } #endif +// precomputed f32 table for f16 (256 KB) +// defined in ggml.c, initialized in ggml_init() +GGML_API float ggml_table_f32_f16[1 << 16]; + // On ARM NEON, it's quicker to directly convert x -> x instead of calling into ggml_lookup_fp16_to_fp32, // so we define GGML_CPU_FP16_TO_FP32 and GGML_CPU_FP32_TO_FP16 elsewhere for NEON. // This is also true for POWER9. diff --git a/ggml/src/ggml-impl.h b/ggml/src/ggml-impl.h index 8d9bdc74..57761644 100644 --- a/ggml/src/ggml-impl.h +++ b/ggml/src/ggml-impl.h @@ -393,10 +393,6 @@ static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) { #define GGML_FP16_TO_FP32(x) GGML_COMPUTE_FP16_TO_FP32(x) #define GGML_FP32_TO_FP16(x) GGML_COMPUTE_FP32_TO_FP16(x) -// precomputed f32 table for f16 (256 KB) -// defined in ggml.c, initialized in ggml_init() -GGML_API float ggml_table_f32_f16[1 << 16]; - /** * Converts brain16 to float32. * diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c index f8e7c595..e0e46288 100644 --- a/ggml/src/ggml.c +++ b/ggml/src/ggml.c @@ -1414,27 +1414,6 @@ static inline bool ggml_can_repeat_rows(const struct ggml_tensor * t0, const str //////////////////////////////////////////////////////////////////////////////// struct ggml_context * ggml_init(struct ggml_init_params params) { - static bool is_first_call = true; - - ggml_critical_section_start(); - - if (is_first_call) { - // initialize time system (required on Windows) - ggml_time_init(); - - for (int i = 0; i < (1 << 16); ++i) { - union { - uint16_t u16; - ggml_fp16_t fp16; - } u = {i}; - ggml_table_f32_f16[i] = GGML_COMPUTE_FP16_TO_FP32(u.fp16); - } - - is_first_call = false; - } - - ggml_critical_section_end(); - struct ggml_context * ctx = GGML_MALLOC(sizeof(struct ggml_context)); // allow to call ggml_init with 0 size -- 2.39.5 (Apple Git-154)

You would have to move the definition of ggml_table_f32_f16 from ggml.c to somewhere in the CPU backend as well.

taronaeo · 2025-06-25T04:36:39Z

I believe these includes can be removed now, or moved to the CPU backend if necessary:

llama.cpp/ggml/src/ggml-impl.h

Lines 15 to 29 in 1b23fec

#ifdef __ARM_FEATURE_SVE

#include <arm_sve.h>

#endif // __ARM_FEATURE_SVE

#if defined(__ARM_NEON) && !defined(__CUDACC__) && !defined(__MUSACC__)

// if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example:

//

// $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/

//

#include <arm_neon.h>

#endif

#if defined(__F16C__)

#include <immintrin.h>

#endif

It is moved to the CPU backend already but I left the headers there because there are more SIMD code within ggml-base that still depend on those headers. E.g.,

llama.cpp/ggml/src/ggml-quants.c

Lines 5041 to 5055 in 73e53dc

    
           #elif defined(__ARM_NEON) 
        
                           for (; i + 7 < nb; i += 8) { 
        
                               uint16x8_t v = vld1q_u16(f + i); 
        
                               uint16x8_t vexp = vandq_u16(v, vdupq_n_u16(0x7c00)); 
        
                               uint16x8_t cmp = vceqq_u16(vexp, vdupq_n_u16(0x7c00)); 
        
                               uint64_t mask = vget_lane_u64(vreinterpret_u64_u8(vshrn_n_u16(cmp, 4)), 0); 
        
                               if (mask) { 
        
                                   for (size_t j = 0; j < 8; ++j) { 
        
                                       if (!validate_fp16(f[i + j], i + j)) { 
        
                                           return false; 
        
                                       } 
        
                                   } 
        
                                   GGML_UNREACHABLE(); 
        
                               } 
        
                           }

llama.cpp/ggml/src/ggml-quants.c

Lines 5082 to 5096 in 73e53dc

    
           #elif defined(__ARM_NEON) 
        
                           for (; i + 3 < nb; i += 4) { 
        
                               uint32x4_t v = vld1q_u32((const uint32_t *)f + i); 
        
                               uint32x4_t vexp = vandq_u32(v, vdupq_n_u32(0x7f800000)); 
        
                               uint32x4_t cmp = vceqq_u32(vexp, vdupq_n_u32(0x7f800000)); 
        
                               uint64_t mask = vget_lane_u64(vreinterpret_u64_u16(vshrn_n_u32(cmp, 8)), 0); 
        
                               if (mask) { 
        
                                   for (size_t j = 0; j < 4; ++j) { 
        
                                       if (!validate_float(f[i + j], i + j)) { 
        
                                           return false; 
        
                                       } 
        
                                   } 
        
                                   GGML_UNREACHABLE(); 
        
                               } 
        
                           }

Was wondering if you have a proper place for me to move these into ggml-cpu?

taronaeo · 2025-06-25T06:48:28Z

Okay I've been trying to move the ggml_validate_row_data SIMD operations from ggml-base/ggml-quants.c into the respective ggml-cpu/arch/quants.c but I'm not confident in doing this refactor. Can we move this specific code change to another PR or someone else own the change instead?

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

… failures" This reverts commit 32a3533. Signed-off-by: Aaron Teo <[email protected]>

This reverts commit 9e40d98. Signed-off-by: Aaron Teo <[email protected]>

slaren · 2025-06-25T10:11:47Z

Was wondering if you have a proper place for me to move these into ggml-cpu?

It's not great, but this code is not important, you can ignore it.

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 9e40d98)

Signed-off-by: Aaron Teo <[email protected]>

ggml/src/ggml-cpu/simd-mappings.h

slaren · 2025-06-25T15:20:27Z

This code can be removed now:

llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c

Lines 3463 to 3468 in 6cebee2

    
           // needed to initialize f16 tables 
        
           { 
        
               struct ggml_init_params params = { 0, NULL, false }; 
        
               struct ggml_context * ctx = ggml_init(params); 
        
               ggml_free(ctx); 
        
           }

ggml/src/ggml-cpu/simd-mappings.h

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-06-25T15:49:45Z

I am seeing a consistent failure in windows and server CIs.

Windows CI:

The following tests FAILED:
	 19 - test-thread-safety (NUMERICAL)                    main
	 21 - test-backend-ops (NUMERICAL)                      main

Server CI:

FAILED unit/test_rerank.py::test_rerank - assert 3 == 2

or, it simply fails to start on windows.

I don't know the ggml codebase that well to know why the shift from ggml-base to ggml-cpu causes these problems, but feel free to edit my branch as my work for NNPA Vector Intrinsics are already completed. otherwise, I'd be happy to continue receiving directions but I'll be going in blind 😆

Edit: Alternative if you're okay with it, we revert all ggml_table_f32_f16 back to ggml-base and we handle it in a separate PR because correct me if im wrong, that specific refactor is affecting quite a lot of things

we rely on the variable declaration in ggml-cpu.c instead Signed-off-by: Aaron Teo <[email protected]>

This reverts commit f71b21d. Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

This reverts commit 2dce119. Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-06-25T16:34:37Z

Please let me know how we should proceed with this PR, I hope to close it soon :)

We can either,

Continue refactoring ggml-base to ggml-cpu via this PR (I'll be hands-off for this option, full write-access to you) or;
Continue refactoring ggml-base to ggml-cpu via your direction (I'll try - but following blindly due to my limited knowledge of the codebase) or;
We revert all ggml_table_f32_f16 changes back to ggml-base where all CIs are green, and open another PR specifically for that refactor. Someone else will own that change instead. (I prefer this imo)

I'll be logging off for the night now - feel free to share your thoughts and lets work this out

taronaeo added 30 commits June 21, 2025 14:46

ggml-cpu: add nnpa compile flag

5801806

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 4a9f60c)

ggml-cpu: add fp16->fp32 nnpa first

45a4cf6

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 8d4a798)

ggml-cpu: add fp32->fp16

ebf9f34

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 0ff0d65)

ggml-cpu: better variable names

ffe2964

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 2f58bbc)

docs: update s390x docs

0394a00

Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 01b9294)

ggml-cpu: add debugging prints to see if dlf16 is correct

48b820d

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: fix print vs printf

d9cc63a

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: fix float placeholder

94f10ca

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: ensure fp16 and fp32 load and stores are called

8f3a5af

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: fp16 load ensured to hit

575ea9f

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: remove sigint from fp16 store

9330454

for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32

ebc1d19

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements

6a25fd8

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: nnpa switch to vec_xst test

f9f6c7e

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: switch to vec_xst for 4 element loops also

6d507bb

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: rework noop

8312adc

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: remove noop, general code cleanup

27b4c3f

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: clarify variable naming

e0f8fb9

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16

bb9345c

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: add breakpoint for debugging

5424d9e

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: test fix for conversion failure

4f017d7

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: disable fp32->fp16 nnpa conversions for now

27131e5

there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: switch to elif macro

946c78e

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: reattempt fp32->fp16

433d587

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: fix typo

54811fc

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: reattempt fp32->fp16

e12e9fe

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: fix compiler types

7413dab

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: change to typedef vector types

373fa28

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: add 4 element loops for fp32->fp16

4621a23

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: clarified vector naming

987d169

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested a review from slaren June 24, 2025 16:07

slaren reviewed Jun 24, 2025

View reviewed changes

ggml/src/ggml-cpu/simd-mappings.h Outdated Show resolved Hide resolved

taronaeo added 2 commits June 25, 2025 01:07

ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu

a02b360

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: remove mistaken fallback macro

1b23fec

fallback logic was already implemented but i was too sleepy to realise Signed-off-by: Aaron Teo <[email protected]>

slaren requested a review from Copilot June 24, 2025 19:43

This comment was marked as outdated.

Sign in to view

slaren reviewed Jun 24, 2025

View reviewed changes

taronaeo added 4 commits June 25, 2025 14:58

ggml: move ggml_table_f32_f16 to ggml-cpu

9e40d98

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures

32a3533

Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci…

827fce9

… failures" This reverts commit 32a3533. Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"

5be39c1

This reverts commit 9e40d98. Signed-off-by: Aaron Teo <[email protected]>

taronaeo added 2 commits June 25, 2025 22:29

ggml: move ggml_table_f32_f16 to ggml-cpu

59b48e4

ref: ggml-org#14317 (comment) Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 9e40d98)

ggml: move ggml_table_f32_f16 to ggml-cpu.c

6cebee2

Signed-off-by: Aaron Teo <[email protected]>

slaren reviewed Jun 25, 2025

View reviewed changes

ggml/src/ggml-cpu/simd-mappings.h Outdated Show resolved Hide resolved

slaren reviewed Jun 25, 2025

View reviewed changes

ggml/src/ggml-cpu/simd-mappings.h Outdated Show resolved Hide resolved

ggml-cpu: extern c ggml_table_f32_f16 + chore docs

5f2a09a

Signed-off-by: Aaron Teo <[email protected]>

taronaeo added 4 commits June 26, 2025 00:09

ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h

f71b21d

we rely on the variable declaration in ggml-cpu.c instead Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"

176e1db

This reverts commit f71b21d. Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: bring back ggml_table_f32_f16

2dce119

Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml-cpu: bring back ggml_table_f32_f16"

bb35ea6

This reverts commit 2dce119. Signed-off-by: Aaron Teo <[email protected]>

slaren added 2 commits June 25, 2025 18:56

fix ggml time initialization

8efdc0b

fix f32_f16 table init

4ce16fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: enable IBM NNPA Vector Intrinsics #14317

ggml-cpu: enable IBM NNPA Vector Intrinsics #14317

taronaeo commented Jun 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

taronaeo commented Jun 24, 2025

Uh oh!

slaren commented Jun 24, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

slaren Jun 24, 2025

Uh oh!

taronaeo Jun 25, 2025 •

edited

Loading

Uh oh!

taronaeo Jun 25, 2025 •

edited

Loading

Uh oh!

slaren Jun 25, 2025

Uh oh!

taronaeo commented Jun 25, 2025 •

edited

Loading

Uh oh!

taronaeo commented Jun 25, 2025

Uh oh!

slaren commented Jun 25, 2025

Uh oh!

Uh oh!

slaren commented Jun 25, 2025

Uh oh!

Uh oh!

taronaeo commented Jun 25, 2025 •

edited

Loading

Uh oh!

taronaeo commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

	for (int i = 0; i < (1 << 16); ++i) {
	union {
	uint16_t u16;
	ggml_fp16_t fp16;
	} u = {i};
	float f = GGML_FP16_TO_FP32(u.fp16);
	ggml_table_gelu_f16[i] = GGML_FP32_TO_FP16(ggml_gelu_f32(f));
	ggml_table_gelu_quick_f16[i] = GGML_FP32_TO_FP16(ggml_gelu_quick_f32(f));
	}

ggml-cpu: enable IBM NNPA Vector Intrinsics #14317

Are you sure you want to change the base?

ggml-cpu: enable IBM NNPA Vector Intrinsics #14317

Conversation

taronaeo commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Performance Results

Uh oh!

Uh oh!

taronaeo commented Jun 24, 2025

Uh oh!

slaren commented Jun 24, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

slaren Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

taronaeo Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taronaeo Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

taronaeo commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Jun 25, 2025

Uh oh!

slaren commented Jun 25, 2025

Uh oh!

Uh oh!

slaren commented Jun 25, 2025

Uh oh!

Uh oh!

taronaeo commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

taronaeo commented Jun 21, 2025 •

edited

Loading

taronaeo Jun 25, 2025 •

edited

Loading

taronaeo Jun 25, 2025 •

edited

Loading

taronaeo commented Jun 25, 2025 •

edited

Loading

taronaeo commented Jun 25, 2025 •

edited

Loading

taronaeo commented Jun 25, 2025 •

edited

Loading