Change the repository type filter
All
Repositories list
14 repositories
whl
PublicScaleLLM
PublicA high-performance inference system for large language models, designed for production environments.3FS
Publicflashinfer
PublicFlashMLA
Publicvcpkg
PublicLLMBench
Publicdiscussions
Publicchatbot-ui
Publicflash-attention
Publictokenizers
Publicxformers
PublicFasterTransformer
PublicByteTransformer
Publicoptimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052