- Hong Kong
-
05:50
(UTC +08:00)
Highlights
- Pro
Pinned Loading
-
kvcache-ai/ktransformers
kvcache-ai/ktransformers PublicA Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
-
chitu
chitu PublicForked from thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Python
-
fastllm
fastllm PublicForked from ztxz16/fastllm
fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。
C++
-
sglang
sglang PublicForked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Python
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
vllm-ascend
vllm-ascend PublicForked from vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Python
If the problem persists, check the GitHub status page or contact support.