Skip to content

Commit

Permalink
Refining the Deversus Quantization Documentation (#802)
Browse files Browse the repository at this point in the history
Co-authored-by: Yao Chi <[email protected]>
  • Loading branch information
lijunliangTG and doombeaker authored Apr 17, 2024
1 parent 87eb97f commit 7a7b718
Show file tree
Hide file tree
Showing 4 changed files with 214 additions and 7 deletions.
3 changes: 3 additions & 0 deletions README_ENTERPRISE.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ OneDiff Enterprise offers a quantization method that reduces memory usage, incre
- [Accessing Diffusers Models](#accessing-diffusers-models-2)
- [Scripts](#scripts-2)
- [SVD + DeepCache](#svd--deepcache)
- [Quantitative model](#quantitative-model)
- [Contact](#contact)


Expand Down Expand Up @@ -382,6 +383,8 @@ python3 benchmarks/image_to_video.py \
--output-video path/to/output_image.mp4
```

## Quantitative model
Due to space limitations, specific quantification-related documents can be found in the [quantization document](./src/onediff/quantization/README.md#how-to-use-onediff-quantization).

## Contact

Expand Down
141 changes: 134 additions & 7 deletions src/onediff/quantization/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,134 @@
## <div align="center">OneDiff Quant 🚀 NEW</div>
## <div align="center">Documentation</div>
- [Installation Guide](https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#install-onediff-enterprise)
- [How to use Online Quant](../../../onediff_diffusers_extensions/examples/text_to_image_online_quant.py)
- [How to use Offline Quant](./quantize_pipeline.py)
- [How to Quant a custom model](../../../tests/test_quantize_custom_model.py)
- [Community and Support](https://github.com/siliconflow/onediff?tab=readme-ov-file#community-and-support)

<p align="center">
<img src="../../../imgs/onediff_logo.png" height="100">
</p>

# <div align="center">OneDiff Quant 🚀 Documentation</div>
OneDiff Enterprise offers a quantization method that reduces memory usage, increases speed, and maintains quality without any loss.

Here's the optimized results, Timings for 30 steps in Diffusers-SDXL at 1024x1024.
| Accelerator | Baseline (non-optimized) | OneDiff(optimized online) | OneDiff Quant(optimized) |
| ----------------------- | ------------------------ | ------------------ | ------------------------ |
| NVIDIA GeForce RTX 3090 | 8.03 s | 4.44 s ( ~44.7%) | **3.34 s ( ~58.4%)** |

- torch {version: 2.2.1+cu121}
- oneflow {version: 0.9.1.dev20240406+cu121, enterprise: True}

Here's the optimized results, Timings for 30 steps in Diffusers-SD-1.5 at 1024x1024.
| Accelerator | Baseline (non-optimized) | OneDiff(optimized online) | OneDiff Quant(optimized) |
| ----------------------- | ------------------------ | ------------------ | ------------------------ |
| NVIDIA GeForce RTX 3090 | 6.87 s | 3.41 s ( ~50.3%) | **3.13 s ( ~54.4%)** |

- torch {version: 2.2.2+cu121}
- oneflow {version: 0.9.1.dev20240403+cu122, enterprise: True}

**Note**: Before proceeding with this document, please ensure you are familiar with the [OneDiff Community](../../../README.md) features and OneDiff ENTERPRISE by referring to the [ENTERPRISE Guide](../../../README_ENTERPRISE.md#install-onediff-enterprise).

- [Prepare environment](#prepare-environment)
- [Baseline (non-optimized)](#baseline-non-optimized)
- [How to use onediff quantization](#how-to-use-onediff-quantization)
- [Online quantification](#online-quantification)
- [Online quantification (optimized)](#online-quantification-optimized)
- [Offline quantification](#offline-quantification)
- [Quantify a custom model](#quantify-a-custom-model)
- [Community and Support](#community-and-support)

## Prepare environment

You need to complete the following environment dependency installation.

- 1. [OneDiff Installation Guide](https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#install-onediff-enterprise)
- 2. [OneDiffx Installation Guide](https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions#install-and-setup)

## Baseline (non-optimized)

You can obtain the baseline by running the following command.

```bash
python onediff_diffusers_extensions/examples/text_to_image_online_quant.py \
--model_id /PATH/TO/YOU/MODEL \
--seed 1 \
--backend torch --height 1024 --width 1024 --output_file sdxl_torch.png
```

## How to use onediff quantization

Onediff quantization supports acceleration of all diffusion models. This document will be explained based on the SDXL model.First, you can download [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model.

### Online quantification

**Note**: When performing quantification for the first time, it is necessary to perform dependency analysis of the data and determine the parameters required for quantification, such as the maximum and minimum values of the data, which require additional computing time. Once these parameters are determined and cached, subsequent quantization processes can use these parameters directly, thus speeding up processing.When quantization is performed for the second time, the log `*.pt` file is cached. Quantization result information can be found in `cache_dir/quantization_stats.json`.

#### Online quantification (optimized)

You can run it using the following command.

```bash
python onediff_diffusers_extensions/examples/text_to_image_online_quant.py \
--model_id /PATH/TO/YOU/MODEL \
--seed 1 \
--backend onediff \
--cache_dir ./run_sdxl_quant \
--height 1024 \
--width 1024 \
--output_file sdxl_quant.png \
--quantize \
--conv_mae_threshold 0.1 \
--linear_mae_threshold 0.2 \
--conv_compute_density_threshold 900 \
--linear_compute_density_threshold 300
```

The parameters of the preceding command are shown in the following table.
| Option | Range | Default | Description |
| -------------------------------------- | ------ | ------- | ---------------------------------------------------------------------------- |
| --conv_mae_threshold 0.1 | [0, 1] | 0.1 | MAE threshold for quantizing convolutional modules to 0.1. |
| --linear_mae_threshold 0.2 | [0, 1] | 0.2 | MAE threshold for quantizing linear modules to 0.2. |
| --conv_compute_density_threshold 900 | [0, ∞) | 900 | Computational density threshold for quantizing convolutional modules to 900. |
| --linear_compute_density_threshold 300 | [0, ∞) | 300 | Computational density threshold for quantizing linear modules to 300. |

### Offline quantification

To quantify a custom model as int8, run the following script.

```bash
python ./src/onediff/quantization/quant_pipeline_test.py \
--floatting_model_path "stabilityai/stable-diffusion-xl-base-1.0" \
--prompt "a photo of an astronaut riding a horse on mars" \
--height 1024 \
--width 1024 \
--num_inference_steps 30 \
--conv_compute_density_threshold 900 \
--linear_compute_density_threshold 300 \
--conv_ssim_threshold 0.985 \
--linear_ssim_threshold 0.991 \
--save_as_float False \
--cache_dir "./run_sd-v1-5" \
--quantized_model ./quantized_model
```

If you want to load a quantized model, you can modify the quantized_model parameter to the path of the specific model, such as the [sd-1.5-onediff-enterprise](https://huggingface.co/siliconflow/stable-diffusion-v1-5-onediff-comfy-enterprise-v1) and [sd-1.5-onediff-deepcache models](https://huggingface.co/siliconflow/stable-diffusion-v1-5-onediff-deepcache-int8). [Stable-diffusion-v2-1-onediff-enterprise](https://huggingface.co/siliconflow/stable-diffusion-v2-1-onediff-enterprise) it has not been quantified, so it needs to be quantified first.

```bash
python ./src/onediff/quantization/load_quantized_model.py \
--prompt "a photo of an astronaut riding a horse on mars" \
--height 1024 \
--width 1024 \
--num_inference_steps 30 \
--quantized_model ./quantized_model
```

## Quantify a custom model

To achieve quantization of custom models, please refer to the following script.

```bash
python tests/test_quantize_custom_model.py
```

## Community and Support

[Here is the introduction of OneDiff Community.](https://github.com/siliconflow/onediff/wiki#onediff-community)
- [Create an issue](https://github.com/siliconflow/onediff/issues)
- Chat in Discord: [![](https://dcbadge.vercel.app/api/server/RKJTjZMcPQ?style=plastic)](https://discord.gg/RKJTjZMcPQ)
- Email for Enterprise Edition or other business inquiries: [email protected]
31 changes: 31 additions & 0 deletions src/onediff/quantization/load_quantized_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from diffusers import AutoPipelineForText2Image
from onediff.quantization.quantize_pipeline import QuantPipeline
import argparse
import torch
from onediff.infer_compiler import oneflow_compile

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--prompt", default="a photo of an astronaut riding a horse on mars")
parser.add_argument("--height", type= int,default=1024)
parser.add_argument("--width", type= int, default=1024)
parser.add_argument("--num_inference_steps", type=int, default=30)
parser.add_argument("--quantized_model", type=str, required=True)
return parser.parse_args()

args = parse_args()

pipe = QuantPipeline.from_quantized(
AutoPipelineForText2Image, args.quantized_model, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe = pipe.to("cuda")

pipe_kwargs = dict(
prompt=args.prompt,
height=args.height,
width=args.width,
num_inference_steps=args.num_inference_steps,
)

pipe.unet = oneflow_compile(pipe.unet)
pipe(**pipe_kwargs).images[0].save("test.png")
46 changes: 46 additions & 0 deletions src/onediff/quantization/quant_pipeline_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from diffusers import AutoPipelineForText2Image
from onediff.quantization.quantize_pipeline import QuantPipeline
import torch
import argparse


def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--floatting_model_path", default="runwayml/stable-diffusion-v1-5")
parser.add_argument("--prompt", default="a photo of an astronaut riding a horse on mars")
parser.add_argument("--height",type=int, default=1024)
parser.add_argument("--width", type=int,default=1024)
parser.add_argument("--num_inference_steps", type=int, default=30)
parser.add_argument("--conv_compute_density_threshold", type=int, default=900)
parser.add_argument("--linear_compute_density_threshold", type=int, default=300)
parser.add_argument("--conv_ssim_threshold", type=float, default=0.985)
parser.add_argument("--linear_ssim_threshold", type=float, default=0.991)
parser.add_argument("--save_as_float", type=bool, default=False)
parser.add_argument("--cache_dir", default="./run_sd-v1-5")
parser.add_argument("--quantized_model", default="./quantized_model")
return parser.parse_args()

args = parse_args()

pipe = QuantPipeline.from_pretrained(
AutoPipelineForText2Image, args.floatting_model_path, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

pipe_kwargs = dict(
prompt=args.prompt,
height=args.height,
width=args.width,
num_inference_steps=args.num_inference_steps,
)

pipe.quantize(**pipe_kwargs,
conv_compute_density_threshold=args.conv_compute_density_threshold,
linear_compute_density_threshold=args.linear_compute_density_threshold,
conv_ssim_threshold=args.conv_ssim_threshold,
linear_ssim_threshold=args.linear_ssim_threshold,
save_as_float=args.save_as_float,
plot_calibrate_info=False,
cache_dir=args.cache_dir)

pipe.save_quantized(args.quantized_model, safe_serialization=True)

0 comments on commit 7a7b718

Please sign in to comment.