Potentially buffer overflow in `make_block_q4_0x4` #1094

ilhamsyahids · 2025-02-02T18:11:23Z

I'm building ggml on debian and got warning and note:

~/ggml/build$ cmake --build . --config Release -j 8
[  1%] Building C object src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[  2%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[  3%] Building CXX object src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[  5%] Building C object src/CMakeFiles/ggml-base.dir/ggml.c.o
[  6%] Building CXX object src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[  6%] Building CXX object src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[  7%] Building C object src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[  8%] Building CXX object src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[  9%] Linking CXX shared library libggml-base.so
[  9%] Built target ggml-base
[ 10%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[ 11%] Building C object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[ 13%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o
[ 15%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o
[ 15%] Building C object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o
[ 15%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o
[ 16%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[ 17%] Building CXX object src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at /home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3685:39:
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3614:19: warning: writing 32 bytes into a region of size 0 [-Wstringop-overflow=]
 3614 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3685:20: note: at offset 72 into destination object ‘<anonymous>’ of size 72
 3685 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at /home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3685:39:
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3614:19: warning: writing 32 bytes into a region of size 0 [-Wstringop-overflow=]
 3614 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
/home/admin/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:3685:20: note: at offset 104 into destination object ‘<anonymous>’ of size 72
 3685 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 18%] Linking CXX static library libcommon.a
[ 18%] Built target common
[ 19%] Linking CXX shared library libggml-cpu.so
[ 19%] Built target ggml-cpu
[ 20%] Building CXX object src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o
[ 21%] Linking CXX shared library libggml.so
[ 21%] Built target ggml
...

it referencing this method:

ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Lines 3594 to 3634 in 475e012

    
           static block_q4_0x4 make_block_q4_0x4(block_q4_0 * in, unsigned int blck_size_interleave) { 
        
               block_q4_0x4 out; 
        
               for (int i = 0; i < 4; i++) { 
        
                   out.d[i] = in[i].d; 
        
               } 
        
               const int end = QK4_0 * 2 / blck_size_interleave; 
        
               if (blck_size_interleave == 8) { 
        
                   const uint64_t xor_mask = 0x8888888888888888ULL; 
        
                   for (int i = 0; i < end; ++i) { 
        
                       int src_id = i % 4; 
        
                       int src_offset = (i / 4) * blck_size_interleave; 
        
                       int dst_offset = i * blck_size_interleave; 
        
                       uint64_t elems; 
        
                       // Using memcpy to avoid unaligned memory accesses 
        
                       memcpy(&elems, &in[src_id].qs[src_offset], sizeof(uint64_t)); 
        
                       elems ^= xor_mask; 
        
                       memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t)); 
        
                   } 
        
               } else if (blck_size_interleave == 4) { 
        
                   const uint32_t xor_mask = 0x88888888; 
        
                   for (int i = 0; i < end; ++i) { 
        
                       int src_id = i % 4; 
        
                       int src_offset = (i / 4) * blck_size_interleave; 
        
                       int dst_offset = i * blck_size_interleave; 
        
                       uint32_t elems; 
        
                       memcpy(&elems, &in[src_id].qs[src_offset], sizeof(uint32_t)); 
        
                       elems ^= xor_mask; 
        
                       memcpy(&out.qs[dst_offset], &elems, sizeof(uint32_t)); 
        
                   } 
        
               } else { 
        
                   GGML_ASSERT(false); 
        
               } 
        
               return out; 
        
           }

this might be issue in building app that requires ggml in debian such as ollama

slaren · 2025-02-02T19:00:19Z

What compiler or flags are you using? I do not see it with gcc 13.3.

ilhamsyahids · 2025-02-02T19:39:08Z

I am using the default one when installing the compiler:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)

i tried to reinstall but it looks like the max version gcc on debian is 12.2.0:

$ sudo apt install gcc
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
gcc is already the newest version (4:12.2.0-3).

slaren · 2025-02-03T23:12:47Z

The oldest gcc version that I can easily install in my system is 12.3, and it does not generate this warning. I suspect that this is a spurious warning, possibly fixed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106904.
This is performance sensitive code, so if removing this warning is absolutely necessary for you, it would be better to do so by disabling the warning in the compiler than by adding unnecessary and potentially expensive checks.

ilhamsyahids · 2025-02-04T03:48:37Z

what OS are you using with? @slaren
on debian the latest stable is 12.2.0. the latest version is 12.4 but its on testing, i think thats why i cant install it:

$ sudo apt list gcc
Listing... Done
gcc/stable,now 4:12.2.0-3 amd64 [installed]

$ sudo apt install gcc=12.4.0
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package gcc is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Version '12.4.0' for 'gcc' was not found

ilhamsyahids · 2025-02-04T04:01:33Z

As context, I try to install ollama that using ggml on debian, its stop on this step:

~/ollama$ make -j12
GOARCH=amd64 go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.7-0-ga420a45\"  " -trimpath -tags "avx" -o llama/build/linux-amd64/runners/cpu_avx/ollama_llama_server ./cmd/runner
GOARCH=amd64 go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.7-0-ga420a45\"  " -trimpath -tags "avx,avx2" -o llama/build/linux-amd64/runners/cpu_avx2/ollama_llama_server ./cmd/runner
# github.com/ollama/ollama/llama
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 32 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 72 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 32 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 104 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# github.com/ollama/ollama/llama
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 72 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 88 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 104 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 120 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GOARCH=amd64 go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.7-0-ga420a45\"  " -trimpath  -o ollama .
# github.com/ollama/ollama/llama
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 72 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 88 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 104 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘block_q4_0x4 make_block_q4_0x4(block_q4_0*, unsigned int)’,
    inlined from ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’ at ggml-cpu-aarch64.cpp:3711:39:
ggml-cpu-aarch64.cpp:3640:19: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
 3640 |             memcpy(&out.qs[dst_offset], &elems, sizeof(uint64_t));
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-cpu-aarch64.cpp: In function ‘int repack_q4_0_to_q4_0_4_bl(ggml_tensor*, int, const void*, size_t)’:
ggml-cpu-aarch64.cpp:3711:20: note: at offset 120 into destination object ‘<anonymous>’ of size 72
 3711 |             *dst++ = make_block_q4_0x4(dst_tmp, interleave_block);
      |             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ilhamsyahids linked a pull request Feb 2, 2025 that will close this issue

ggml-cpu: Add bounds checking in make_block_q4_0x4 function #1095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially buffer overflow in `make_block_q4_0x4` #1094

Potentially buffer overflow in `make_block_q4_0x4` #1094

ilhamsyahids commented Feb 2, 2025

slaren commented Feb 2, 2025

ilhamsyahids commented Feb 2, 2025 •

edited

Loading

slaren commented Feb 3, 2025

ilhamsyahids commented Feb 4, 2025

ilhamsyahids commented Feb 4, 2025

Potentially buffer overflow in make_block_q4_0x4 #1094

Potentially buffer overflow in make_block_q4_0x4 #1094

Comments

ilhamsyahids commented Feb 2, 2025

slaren commented Feb 2, 2025

ilhamsyahids commented Feb 2, 2025 • edited Loading

slaren commented Feb 3, 2025

ilhamsyahids commented Feb 4, 2025

ilhamsyahids commented Feb 4, 2025

Potentially buffer overflow in `make_block_q4_0x4` #1094

Potentially buffer overflow in `make_block_q4_0x4` #1094

ilhamsyahids commented Feb 2, 2025 •

edited

Loading