-
Notifications
You must be signed in to change notification settings - Fork 1
-
deft cuda kernel
Past due by 9 months Last updated 21 days ago- implement and profile cuda kernel;
- benchmark with simplest mem management for different attention kernel baselines.
-
NIPs rebuttal
Past due by 8 months Last updated 9 months ago- significant e2e speedup: more efficient kernels, more large trees;
- compare with concurrent works;
- implement real SD and MR;
-
deft kv manager with demo
Past due by 9 months Last updated 9 months agokv manager selection and implementation: ours, radix tree, hashed …
- kv manager selection and implementation: ours, radix tree, hashed seq group(vllm);
- paged/unpaged selection;
- profile to make sure the bottleneck is attention.