Release v0.2.4 · vectorch-ai/ScaleLLM

What's Changed

ci: add option to skip nvbench build by @guocuimi in #390
ci: build devel image with cuda 12.8 for blackwell by @guocuimi in #391
kernel: added query packing support for attention by @guocuimi in #392
refactor: rename attention to mha to differentiate it from mla by @guocuimi in #393
kernel: added triton aot compiler by @guocuimi in #394
kernel: generate smaller kernel instantiations by @guocuimi in #395
kernel: fix register spilling issue for attention head_dim=256 by @guocuimi in #397
upgrade libtorch to 2.6.0 and cutlass to 3.8.0 by @guocuimi in #398
kernel: added simple MLA kernel by @guocuimi in #396
kernel: added pipeline support for mla by @guocuimi in #399
kernel: added ping-pong rmem support for MLA by @guocuimi in #400
kernel: revert experimental TiledMMA separation change. by @guocuimi in #401
kernel: put query alwasy in registers for mha by @guocuimi in #402
kernel: use 8 warps to avoid register spilling for mla with hdim=512 by @guocuimi in #403
kernel: revert mla ping-pong rmem change by @guocuimi in #404
kernel: refactor mask logic to avoid using hard-coded stride. by @guocuimi in #405
kernel: added causal mask for MLA kernel by @guocuimi in #406
kernel: added blk_n=16 for MLA to support sm_86/sm_89 with only 100kb smem by @guocuimi in #407
kernel: fix mask bugs for MLA by @guocuimi in #408
kernel: use differnt TiledMma for GEMM qk and pv by @guocuimi in #409
kernel: added stage support for MLA kernel by @guocuimi in #410
misc: upgrade cuda version and add devcontainer for manylinux by @guocuimi in #412
kernel: added q and kv oob handling for MLA kernel by @guocuimi in #413
kernel: optimize mask loop for MLA kernel by @guocuimi in #414
kernel: added paged kv support for MLA kernel by @guocuimi in #415
kernel: fix kv oob issue and added more unittests for paged MLA by @guocuimi in #416
kernel: use FastDivmod in attention kernels by @guocuimi in #417

Full Changelog: v0.2.3...v0.2.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.4

What's Changed

Contributors