Skip to content

v0.2.4

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 02 Mar 02:34
· 1 commit to main since this release

What's Changed

  • ci: add option to skip nvbench build by @guocuimi in #390
  • ci: build devel image with cuda 12.8 for blackwell by @guocuimi in #391
  • kernel: added query packing support for attention by @guocuimi in #392
  • refactor: rename attention to mha to differentiate it from mla by @guocuimi in #393
  • kernel: added triton aot compiler by @guocuimi in #394
  • kernel: generate smaller kernel instantiations by @guocuimi in #395
  • kernel: fix register spilling issue for attention head_dim=256 by @guocuimi in #397
  • upgrade libtorch to 2.6.0 and cutlass to 3.8.0 by @guocuimi in #398
  • kernel: added simple MLA kernel by @guocuimi in #396
  • kernel: added pipeline support for mla by @guocuimi in #399
  • kernel: added ping-pong rmem support for MLA by @guocuimi in #400
  • kernel: revert experimental TiledMMA separation change. by @guocuimi in #401
  • kernel: put query alwasy in registers for mha by @guocuimi in #402
  • kernel: use 8 warps to avoid register spilling for mla with hdim=512 by @guocuimi in #403
  • kernel: revert mla ping-pong rmem change by @guocuimi in #404
  • kernel: refactor mask logic to avoid using hard-coded stride. by @guocuimi in #405
  • kernel: added causal mask for MLA kernel by @guocuimi in #406
  • kernel: added blk_n=16 for MLA to support sm_86/sm_89 with only 100kb smem by @guocuimi in #407
  • kernel: fix mask bugs for MLA by @guocuimi in #408
  • kernel: use differnt TiledMma for GEMM qk and pv by @guocuimi in #409
  • kernel: added stage support for MLA kernel by @guocuimi in #410
  • misc: upgrade cuda version and add devcontainer for manylinux by @guocuimi in #412
  • kernel: added q and kv oob handling for MLA kernel by @guocuimi in #413
  • kernel: optimize mask loop for MLA kernel by @guocuimi in #414
  • kernel: added paged kv support for MLA kernel by @guocuimi in #415
  • kernel: fix kv oob issue and added more unittests for paged MLA by @guocuimi in #416
  • kernel: use FastDivmod in attention kernels by @guocuimi in #417

Full Changelog: v0.2.3...v0.2.4