- Guangzhou, China
-
06:11
(UTC +08:00) - https://github.com/DefTruth
- https://www.zhihu.com/people/qyjdef
Pinned Loading
-
xlite-dev/lite.ai.toolkit
xlite-dev/lite.ai.toolkit Public🛠 A lite C++ toolkit: Deploy 100+ AI models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, etc) via MNN, ORT and TRT. 🎉🎉
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
xlite-dev/Awesome-LLM-Inference
xlite-dev/Awesome-LLM-Inference Public📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉
-
PaddlePaddle/FastDeploy
PaddlePaddle/FastDeploy Public⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…
-
xlite-dev/CUDA-Learn-Notes
xlite-dev/CUDA-Learn-Notes Public📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels, FA2, HGEMM via MMA and CuTe (~99% TFLOPS of cuBLAS/FA2 🎉).
-
xlite-dev/ffpa-attn-mma
xlite-dev/ffpa-attn-mma Public📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.
If the problem persists, check the GitHub status page or contact support.