flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 197
Star 2k

Code
Issues 69
Pull requests 10
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

[Roadmap] FlashInfer v0.2 to v0.3

#675 opened Dec 17, 2024 by yzh119

Open 1

Deprecation Notice: Python 3.8 Wheel Support to End in future...

#682 opened Dec 18, 2024 by yzh119

Open 2

[Tracing Issue] Multi-head Latent Attention

#792 opened Feb 6, 2025 by yzh119

Open

Labels 16 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

69 Open 124 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

flashinfer not found by importlib.metadata.PackageNotFoundError

#794 opened Feb 7, 2025 by Amaodemao

[Tracing Issue] Multi-head Latent Attention

#792 opened Feb 6, 2025 by yzh119

2 of 5 tasks

[Feature] Reuse JIT code path for building AOT wheel

#791 opened Feb 6, 2025 by abcdabcd987

Question about cascade inference

#789 opened Feb 5, 2025 by sleepwalker2017

[Tracking Issue] Performance with CUDA 12.8

#780 opened Feb 4, 2025 by yzh119

6 tasks

PyTorch 2.6 wheels

#771 opened Feb 1, 2025 by yzh119

Questions about KV loading

#758 opened Jan 27, 2025 by carryyu

read cached content

#750 opened Jan 24, 2025 by Mrxiangli

[Feature] Llama3.1 RoPE on the fly enhancement

New feature or request

#746 opened Jan 21, 2025 by turboderp

BatchPrefillWithPagedKVCacheDispatched for sm90

#745 opened Jan 21, 2025 by tangcy98

[Tracking Issue] Checklist for 8-bit KV-Cache improments

#742 opened Jan 17, 2025 by yzh119

5 tasks

AttributeError: module 'flashinfer._kernels' has no attribute 'apply_rope_pos_ids_cos_sin_cache'

#741 opened Jan 17, 2025 by fergusfinn

[Feature] MiniMax-01 Lightning Attention

#739 opened Jan 16, 2025 by yzh119

Build from source fails with older versions of setuptools (<72.2.0) because of PosixPath types returned from .glob

#738 opened Jan 16, 2025 by fergusfinn

Flashinfer==0.2.0 precision error when tested on vLLM unit tests

#736 opened Jan 13, 2025 by Dr-Left

C++ benchmarks CMake error caused by enable_fp16 option in generate.py

#734 opened Jan 13, 2025 by rtxxxpro

[RFC]: Introducing ReproSpec for Strong Reproducibility in LLM Inference RFC

Request for Comments

#733 opened Jan 11, 2025 by yzh119

ROPE problem in MLA decode kernel

#730 opened Jan 9, 2025 by lsw825

after torch compile with 0.2.0, speed is become very slow

#727 opened Jan 9, 2025 by MichoChan

Custom mask slows down attention.

#724 opened Jan 8, 2025 by qiyuxinlin

How to use low bit KV Cache

#721 opened Jan 7, 2025 by sitabulaixizawaluduo

RuntimeError: Qwen2-VL does not support _Backend.FLASHINFER backend now

#720 opened Jan 7, 2025 by duzw9311

v0.1.6 release confusion

#715 opened Jan 3, 2025 by lflis

[RFC] Un-fused softmax for short-query(decode) attention RFC

Request for Comments

#707 opened Dec 30, 2024 by yzh119

[Question] How to support custom stride of paged_kv for hopper prefill attention

#702 opened Dec 27, 2024 by jianfei-wangg

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly