Milestones - LINs-lab/DeFT · GitHub

3 Open 0 Closed

deft cuda kernel

Past due by 9 months Last updated 21 days ago
1. implement and profile cuda kernel;
2. benchmark with simplest mem management for different attention kernel baselines.
33% complete

4 open

2 closed
NIPs rebuttal

Past due by 8 months Last updated 9 months ago
1. significant e2e speedup: more efficient kernels, more large trees;
2. compare with concurrent works;
3. implement real SD and MR;
0% complete

1 open

0 closed
deft kv manager with demo

Past due by 9 months Last updated 9 months ago
kv manager selection and implementation: ours, radix tree, hashed …
1. kv manager selection and implementation: ours, radix tree, hashed seq group(vllm);
2. paged/unpaged selection;
3. profile to make sure the bottleneck is attention.
0% complete

1 open

0 closed