Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
-
Updated
Feb 15, 2025 - Python
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
AI-based search engine done right
A guide to structured generation using constrained decoding
SgLang vs vLLM Comparison
Examples of serving LLM on Modal.
llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.
Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.
To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."