Stars
量化
3 repositories
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.