Fix compilation with FP16_QK_REDUCTION enabled. #962

diptorupd · 2025-03-19T21:58:32Z

As described in #806 and #936, setting the cmake build flag FLASHINFER_GEN_USE_FP16_QK_REDUCTIONS to "true" causes a build failure due to cuda_fp16.h not supporting constexpr cast from __half type to float. Note that the issue is not just a CMake/C++ configuration issue the issue will be triggered even in the flashinfer JIT code compilation path as reported in #915.

The PR fixes #806 and #936 by adding a modified version of the FP16 header from the FP16 library that supports constexpr versions of the conversion functions. To make the conversion functions constexpr, I am using std::bit_cast that is the reason for bumping the required standard to 20.

With these changes I am able to build the C++ API with both FLASHINFER_GEN_USE_FP16_QK_REDUCTIONS ON and OFF.

Fixes #936
Fixes #806

diptorupd added 2 commits March 19, 2025 21:36

Fix compilation with FP16_QK_REDUCTION enabled.

1f5d7f6

Fix precommit

dac425c

diptorupd mentioned this pull request Mar 20, 2025

how to use mla C++ API [C++ compilation error ] #963

Open

diptorupd added a commit to diptorupd/flashinfer that referenced this pull request Mar 20, 2025

Pull in changes from flashinfer-ai#962 into flashinfer-ai#944.

db7dd5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compilation with FP16_QK_REDUCTION enabled. #962

Fix compilation with FP16_QK_REDUCTION enabled. #962

diptorupd commented Mar 19, 2025

Fix compilation with FP16_QK_REDUCTION enabled. #962

Are you sure you want to change the base?

Fix compilation with FP16_QK_REDUCTION enabled. #962

Conversation

diptorupd commented Mar 19, 2025