Refining the Deversus Quantization Documentation (#802)

Co-authored-by: Yao Chi <[email protected]>
siliconflow · Apr 17, 2024 · 7a7b718 · 7a7b718
1 parent 87eb97f
commit 7a7b718
Show file tree

Hide file tree

Showing 4 changed files with 214 additions and 7 deletions.
diff --git a/README_ENTERPRISE.md b/README_ENTERPRISE.md
@@ -35,6 +35,7 @@ OneDiff Enterprise offers a quantization method that reduces memory usage, incre
       - [Accessing Diffusers Models](#accessing-diffusers-models-2)
       - [Scripts](#scripts-2)
       - [SVD + DeepCache](#svd--deepcache)
+  - [Quantitative model](#quantitative-model)
   - [Contact](#contact)
 
 
@@ -382,6 +383,8 @@ python3 benchmarks/image_to_video.py \
   --output-video path/to/output_image.mp4 
 ```
 
+## Quantitative model
+Due to space limitations, specific quantification-related documents can be found in the [quantization document](./src/onediff/quantization/README.md#how-to-use-onediff-quantization).
 
 ## Contact
 

diff --git a/src/onediff/quantization/README.md b/src/onediff/quantization/README.md
@@ -1,7 +1,134 @@
-## <div align="center">OneDiff Quant 🚀 NEW</div>
-## <div align="center">Documentation</div>
-- [Installation Guide](https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#install-onediff-enterprise)
-- [How to use Online Quant](../../../onediff_diffusers_extensions/examples/text_to_image_online_quant.py)
-- [How to use Offline Quant](./quantize_pipeline.py)
-- [How to Quant a custom model](../../../tests/test_quantize_custom_model.py)
-- [Community and Support](https://github.com/siliconflow/onediff?tab=readme-ov-file#community-and-support)
+
+<p align="center">
+<img src="../../../imgs/onediff_logo.png" height="100">
+</p>
+
+# <div align="center">OneDiff Quant 🚀 Documentation</div>
+OneDiff Enterprise offers a quantization method that reduces memory usage, increases speed, and maintains quality without any loss.
+
+Here's the optimized results, Timings for 30 steps in Diffusers-SDXL at 1024x1024.
+| Accelerator             | Baseline (non-optimized) | OneDiff(optimized online) | OneDiff Quant(optimized) |
+| ----------------------- | ------------------------ | ------------------ | ------------------------ |
+| NVIDIA GeForce RTX 3090 | 8.03 s                   | 4.44 s ( ~44.7%)   | **3.34 s ( ~58.4%)**         |
+
+- torch   {version: 2.2.1+cu121}
+- oneflow {version: 0.9.1.dev20240406+cu121, enterprise: True}
+
+Here's the optimized results, Timings for 30 steps in Diffusers-SD-1.5 at 1024x1024.
+| Accelerator             | Baseline (non-optimized) | OneDiff(optimized online) | OneDiff Quant(optimized) |
+| ----------------------- | ------------------------ | ------------------ | ------------------------ |
+| NVIDIA GeForce RTX 3090 | 6.87 s                   | 3.41 s ( ~50.3%)   | **3.13 s ( ~54.4%)**         |
+
+- torch   {version: 2.2.2+cu121}
+- oneflow {version:  0.9.1.dev20240403+cu122, enterprise: True}
+
+**Note**: Before proceeding with this document, please ensure you are familiar with the [OneDiff Community](../../../README.md) features and OneDiff ENTERPRISE  by referring to the  [ENTERPRISE Guide](../../../README_ENTERPRISE.md#install-onediff-enterprise).
+
+- [Prepare environment](#prepare-environment)
+- [Baseline (non-optimized)](#baseline-non-optimized)
+- [How to use onediff quantization](#how-to-use-onediff-quantization)
+  - [Online quantification](#online-quantification)
+    - [Online quantification (optimized)](#online-quantification-optimized)
+  - [Offline quantification](#offline-quantification)
+- [Quantify a custom model](#quantify-a-custom-model)
+- [Community and Support](#community-and-support)
+
+## Prepare environment
+
+You need to complete the following environment dependency installation.
+
+- 1. [OneDiff Installation Guide](https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#install-onediff-enterprise)
+- 2. [OneDiffx Installation Guide](https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions#install-and-setup)
+
+## Baseline (non-optimized)
+
+You can obtain the baseline by running the following command.
+
+```bash
+python onediff_diffusers_extensions/examples/text_to_image_online_quant.py \
+        --model_id  /PATH/TO/YOU/MODEL  \
+        --seed 1 \
+        --backend torch  --height 1024 --width 1024 --output_file sdxl_torch.png
+```
+
+## How to use onediff quantization
+
+Onediff quantization supports acceleration of all diffusion models. This document will be explained based on the SDXL model.First, you can download [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model.
+
+### Online quantification
+
+**Note**: When performing quantification for the first time, it is necessary to perform dependency analysis of the data and determine the parameters required for quantification, such as the maximum and minimum values of the data, which require additional computing time. Once these parameters are determined and cached, subsequent quantization processes can use these parameters directly, thus speeding up processing.When quantization is performed for the second time, the log `*.pt` file is cached. Quantization result information can be found in `cache_dir/quantization_stats.json`.
+
+#### Online quantification (optimized)
+
+You can run it using the following command.
+
+```bash
+python onediff_diffusers_extensions/examples/text_to_image_online_quant.py \
+        --model_id  /PATH/TO/YOU/MODEL  \
+        --seed 1 \
+        --backend onediff \
+        --cache_dir ./run_sdxl_quant \
+        --height 1024 \
+        --width 1024 \
+        --output_file sdxl_quant.png   \
+        --quantize \
+        --conv_mae_threshold 0.1 \
+        --linear_mae_threshold 0.2 \
+        --conv_compute_density_threshold 900 \
+        --linear_compute_density_threshold 300
+```
+
+The parameters of the preceding command are shown in the following table.
+| Option                                 | Range  | Default | Description                                                                  |
+| -------------------------------------- | ------ | ------- | ---------------------------------------------------------------------------- |
+| --conv_mae_threshold 0.1               | [0, 1] | 0.1     | MAE threshold for quantizing convolutional modules to 0.1.                   |
+| --linear_mae_threshold 0.2             | [0, 1] | 0.2     | MAE threshold for quantizing linear modules to 0.2.                          |
+| --conv_compute_density_threshold 900   | [0, ∞) | 900     | Computational density threshold for quantizing convolutional modules to 900. |
+| --linear_compute_density_threshold 300 | [0, ∞) | 300     | Computational density threshold for quantizing linear modules to 300.        |
+
+### Offline quantification
+
+To quantify a custom model as int8, run the following script.
+
+```bash
+python ./src/onediff/quantization/quant_pipeline_test.py \
+        --floatting_model_path "stabilityai/stable-diffusion-xl-base-1.0" \
+        --prompt "a photo of an astronaut riding a horse on mars" \
+        --height 1024 \
+        --width 1024 \
+        --num_inference_steps 30 \
+        --conv_compute_density_threshold 900 \
+        --linear_compute_density_threshold 300 \
+        --conv_ssim_threshold 0.985 \
+        --linear_ssim_threshold 0.991 \
+        --save_as_float False \
+        --cache_dir "./run_sd-v1-5" \
+        --quantized_model ./quantized_model 
+```
+
+If you want to load a quantized model, you can modify the quantized_model parameter to the path of the specific model, such as the [sd-1.5-onediff-enterprise](https://huggingface.co/siliconflow/stable-diffusion-v1-5-onediff-comfy-enterprise-v1) and [sd-1.5-onediff-deepcache models](https://huggingface.co/siliconflow/stable-diffusion-v1-5-onediff-deepcache-int8). [Stable-diffusion-v2-1-onediff-enterprise](https://huggingface.co/siliconflow/stable-diffusion-v2-1-onediff-enterprise) it has not been quantified, so it needs to be quantified first.
+
+```bash
+python ./src/onediff/quantization/load_quantized_model.py \
+        --prompt "a photo of an astronaut riding a horse on mars" \
+        --height 1024 \
+        --width 1024 \
+        --num_inference_steps 30 \
+        --quantized_model ./quantized_model
+```
+
+## Quantify a custom model
+
+To achieve quantization of custom models, please refer to the following script.
+
+```bash
+python tests/test_quantize_custom_model.py
+```
+
+## Community and Support
+
+[Here is the introduction of OneDiff Community.](https://github.com/siliconflow/onediff/wiki#onediff-community)
+- [Create an issue](https://github.com/siliconflow/onediff/issues)
+- Chat in Discord: [![](https://dcbadge.vercel.app/api/server/RKJTjZMcPQ?style=plastic)](https://discord.gg/RKJTjZMcPQ)
+- Email for Enterprise Edition or other business inquiries: [email protected]
diff --git a/src/onediff/quantization/load_quantized_model.py b/src/onediff/quantization/load_quantized_model.py
@@ -0,0 +1,31 @@
+from diffusers import AutoPipelineForText2Image
+from onediff.quantization.quantize_pipeline import QuantPipeline
+import argparse 
+import torch
+from onediff.infer_compiler import oneflow_compile
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--prompt", default="a photo of an astronaut riding a horse on mars")
+    parser.add_argument("--height", type= int,default=1024)
+    parser.add_argument("--width", type= int, default=1024)
+    parser.add_argument("--num_inference_steps", type=int, default=30)
+    parser.add_argument("--quantized_model", type=str, required=True)
+    return parser.parse_args()
+
+args = parse_args()
+
+pipe = QuantPipeline.from_quantized(
+    AutoPipelineForText2Image, args.quantized_model, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
+pipe = pipe.to("cuda")
+
+pipe_kwargs = dict(
+    prompt=args.prompt,
+    height=args.height,
+    width=args.width,
+    num_inference_steps=args.num_inference_steps,
+)
+
+pipe.unet = oneflow_compile(pipe.unet)
+pipe(**pipe_kwargs).images[0].save("test.png")
diff --git a/src/onediff/quantization/quant_pipeline_test.py b/src/onediff/quantization/quant_pipeline_test.py
@@ -0,0 +1,46 @@
+from diffusers import AutoPipelineForText2Image
+from onediff.quantization.quantize_pipeline import QuantPipeline
+import torch
+import argparse 
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--floatting_model_path", default="runwayml/stable-diffusion-v1-5")
+    parser.add_argument("--prompt", default="a photo of an astronaut riding a horse on mars")
+    parser.add_argument("--height",type=int, default=1024)
+    parser.add_argument("--width", type=int,default=1024)
+    parser.add_argument("--num_inference_steps", type=int, default=30)
+    parser.add_argument("--conv_compute_density_threshold", type=int, default=900)
+    parser.add_argument("--linear_compute_density_threshold", type=int, default=300)
+    parser.add_argument("--conv_ssim_threshold", type=float, default=0.985)
+    parser.add_argument("--linear_ssim_threshold", type=float, default=0.991)
+    parser.add_argument("--save_as_float", type=bool, default=False)
+    parser.add_argument("--cache_dir", default="./run_sd-v1-5")
+    parser.add_argument("--quantized_model", default="./quantized_model")
+    return parser.parse_args()
+
+args = parse_args()
+
+pipe = QuantPipeline.from_pretrained(
+    AutoPipelineForText2Image, args.floatting_model_path, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
+pipe.to("cuda")
+
+pipe_kwargs = dict(
+    prompt=args.prompt,
+    height=args.height,
+    width=args.width,
+    num_inference_steps=args.num_inference_steps,
+)
+
+pipe.quantize(**pipe_kwargs,
+    conv_compute_density_threshold=args.conv_compute_density_threshold,
+    linear_compute_density_threshold=args.linear_compute_density_threshold,
+    conv_ssim_threshold=args.conv_ssim_threshold,
+    linear_ssim_threshold=args.linear_ssim_threshold,
+    save_as_float=args.save_as_float,
+    plot_calibrate_info=False,
+    cache_dir=args.cache_dir)
+
+pipe.save_quantized(args.quantized_model, safe_serialization=True)