IntLLaMA: A fast and light quantization solution for LLaMA

Introduction

IntLLaMA, a fast and light quantization solution reduces gpu-memory requirement and improve computational efficiency while simultaneously preserving model intelligence. Specifically, IntLLaMA facilitates a quantization-friendly distribution of hidden-states by utilizing Random Centralization to address the asymmetry and mitigate the impact of outliers. Meanwhile, Hessian-weighted Singular Value Decomposition(HSVD) is further proposed to compensate for the performance degradation caused by representing the model weights using low bit-width. Benefits from RandC and HSVD, IntLLaMA quantize the weight into 4 bit-width, hidden-state into 8 bit-width sperately and close to full-precision performance in perplexity and MMLU accuracy.

Update News

2023-07-13: Release the code for LoRA instruct fine-tuing, More information can be found in
2023-07-13: Release a 4w8f ChatGLMv2-6B, which archieve in C-eval and speedup . The more detail can be found in Table1 .
2023-07-12: Release the code for convert a full-precision model to quantized model

Acknowledgement

IntLLaMA was inspired by several open source projects. We are grateful for these excellent projects and list them as follows:

GPTQ
AWQ
Alpaca-LoRA
Standard-Alpaca

License

IntLLaMA is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
sources		sources
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
evaluate_mmlu.py		evaluate_mmlu.py
generate.py		generate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IntLLaMA: A fast and light quantization solution for LLaMA

Introduction

Update News

Acknowledgement

License

About

Releases

Packages

Contributors 3

Languages

License

megvii-research/IntLLaMA

Folders and files

Latest commit

History

Repository files navigation

IntLLaMA: A fast and light quantization solution for LLaMA

Introduction

Update News

Acknowledgement

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages