Agile Quant

Official repo for paper: Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

This paper is accepted by AAAI 2024

Usage

Replace llama model files in transformers package with transformers/models/llama
Download models sh download.sh
Use GPTQ to quantize weights sh run-gptq-llama.sh
Quantize activation with gptq_fq_quant_llama.py

Citation

@inproceedings{
    shen2024agile,
    title     = {Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge},
    author    = {Shen, Xuan and Dong, Peiyan and Lu, Lei and Kong, Zhenglun and Li, Zhengang and Lin, Ming and Wu, Chao and Wang, Yanzhi},
    booktitle = {AAAI},
    year      = {2024},
}

Acknowledgment

The code is mainly based on the quantization works GPTQ and FQ-ViT.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ptq		ptq
quant		quant
transformers/models/llama		transformers/models/llama
utils		utils
.gitignore		.gitignore
README.md		README.md
convert_llama_weights_to_hf.py		convert_llama_weights_to_hf.py
download.sh		download.sh
gptq.py		gptq.py
gptq_evaluate_gsm8k.py		gptq_evaluate_gsm8k.py
gptq_evaluate_mmlu.py		gptq_evaluate_mmlu.py
gptq_fq_quant_llama.py		gptq_fq_quant_llama.py
run-gptq-llama.sh		run-gptq-llama.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agile Quant

Usage

Citation

Acknowledgment

About

Releases

Packages

Languages

shawnricecake/agile-quant

Folders and files

Latest commit

History

Repository files navigation

Agile Quant

Usage

Citation

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages