Skip to content
  • Past due by 9 months Last updated 21 days ago

    1. implement and profile cuda kernel;
    2. benchmark with simplest mem management for different attention kernel baselines.

    33% complete
  • Past due by 8 months Last updated 9 months ago

    1. significant e2e speedup: more efficient kernels, more large trees;
    2. compare with concurrent works;
    3. implement real SD and MR;

  • Past due by 9 months Last updated 9 months ago

    kv manager selection and implementation: ours, radix tree, hashed …

    1. kv manager selection and implementation: ours, radix tree, hashed seq group(vllm);
    2. paged/unpaged selection;
    3. profile to make sure the bottleneck is attention.