SpongeQuant

Web UI for quantizing LLMs from Hugging Face. Inspired by Oobabooga and the Colab AutoQuant notebook.
Only GGUF quantization works for now. Windows is unstable.

Quick Start

Windows

Install Docker Desktop

Got to https://docs.docker.com/desktop/setup/install/windows-install/, download and install Docker Desktop.

Right click on `start_windows.ps1` and press `Run with PowerShell`.

Linux

./start_linux.sh

Features

Multi-method Quantization: Choose from GGUF, GPTQ, ExLlamaV2, AWQ, and HQQ.
Unified Docker Setup: Separate Dockerfiles for CPU-only and GPU (CUDA) builds.
Dynamic Runtime Detection: Launch the appropriate container based on hardware (via startup scripts for Linux and Windows).
Easy-to-Use Web UI: Built with Gradio, enabling interactive model quantization.

Quantization Methods Comparison

Method	CPU Quantization	CPU Inference	GPU Quantization	GPU Inference	Tradeoffs / Notes
GGUF	Yes	Yes	Yes (but not rquired)	Yes (but not rquired	Designed for efficient CPU inference via llama.cpp; optimized for low precision on CPUs.
GPTQ	No	No	Yes	Yes	High compression & accuracy but built for CUDA; forcing CPU-only leads to very slow and unreliable processing.
ExLlamaV2	No	No	Yes	Yes	Optimized for GPU; CPU fallback is possible but performance is suboptimal.
AWQ	No	No	Yes	Yes	Relies on CUDA kernels for fast quantization; CPU-only execution is generally impractical.
HQQ	No	No	Yes	Yes	Designed primarily for GPU inference with specialized kernels; CPU usage is not widely validated and may be very slow.

GPTQ, ExLlamaV2, AWQ, and HQQ need a GPU for quantization (and inference). As of now, only GGUF is reliably CPU-friendly, both for quantization and inference.

Project Structure

SpongeQuant/
├── app/
│   ├── app.py                # Main application code (Gradio UI)
│   ├── requirements.cpu.txt  # CPU-only dependencies
│   ├── requirements.gpu-cuda.txt  # GPU (CUDA) dependencies
│   └── ...                   # Other application files
├── Dockerfile.cpu            # Dockerfile for CPU-only mode
├── Dockerfile.gpu-cuda       # Dockerfile for GPU (CUDA) mode
├── Dockerfile.gpu-rocm       # (Placeholder for future ROCm support)
├── start_linux.sh            # Startup script for Linux
├── start_windows.ps1         # Startup script for Windows
├── README.md                 # This file
└── ...                       # Other files (models, quantized_models, etc.)

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests on GitHub.

docker run --gpus all -it -p "${PORT}:${PORT}" \
      -v "$(pwd)/app/gguf:/app/gguf" \
      -v "$(pwd)/models:/app/models" \
      -v "$(pwd)/quantized_models:/app/quantized_models" \
      --rm "${IMAGE_NAME}"

docker run -it -p "${PORT}:${PORT}" \
      -v "$(pwd)/app/gguf:/app/gguf" \
      -v "$(pwd)/models:/app/models" \
      -v "$(pwd)/quantized_models:/app/quantized_models" \
      --rm "${IMAGE_NAME}"

x86-64 CPUs have AVX2/FMA support, which accelerate tensor operations in llama.cpp much faster than ARM NEON/DOTPROD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpongeQuant

Quick Start

Windows

Install Docker Desktop

Linux

Features

Quantization Methods Comparison

Project Structure

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
models		models
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.gpu-cuda		Dockerfile.gpu-cuda
Dockerfile.gpu-rocm		Dockerfile.gpu-rocm
LICENSE		LICENSE
README.md		README.md
spongequant-ui.png		spongequant-ui.png
start_linux.sh		start_linux.sh
start_windows.ps1		start_windows.ps1

License

spongeengine/SpongeQuant

Folders and files

Latest commit

History

Repository files navigation

SpongeQuant

Quick Start

Windows

Install Docker Desktop

Linux

Features

Quantization Methods Comparison

Project Structure

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages