-
Notifications
You must be signed in to change notification settings - Fork 111
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add diffusers nexfort example (#998)
- [x] sd1.5 - [x] sdxl - [x] sd2 --------- Co-authored-by: Li Junliang <[email protected]>
- Loading branch information
1 parent
f498be2
commit 5aeb01f
Showing
7 changed files
with
331 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# Run SD1.5 with nexfort backend (Beta Release) | ||
|
||
1. [Environment Setup](#environment-setup) | ||
- [Set Up OneDiff](#set-up-onediff) | ||
- [Set Up NexFort Backend](#set-up-nexfort-backend) | ||
- [Set Up Diffusers Library](#set-up-diffusers) | ||
- [Set Up SD1.5](#set-up-sd15) | ||
2. [Execution Instructions](#run) | ||
- [Run Without Compilation (Baseline)](#run-without-compilation-baseline) | ||
- [Run With Compilation](#run-with-compilation) | ||
3. [Performance Comparison](#performance-comparison) | ||
4. [Dynamic Shape for SD1.5](#dynamic-shape-for-sd15) | ||
5. [Quality](#quality) | ||
|
||
## Environment setup | ||
### Set up onediff | ||
https://github.com/siliconflow/onediff?tab=readme-ov-file#installation | ||
|
||
### Set up nexfort backend | ||
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort | ||
|
||
### Set up diffusers | ||
|
||
``` | ||
pip3 install --upgrade diffusers[torch] | ||
``` | ||
### Set up SD1.5 | ||
Model version for diffusers: https://huggingface.co/runwayml/stable-diffusion-v1-5 | ||
|
||
HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/stable_diffusion/overview.md | ||
|
||
## Run | ||
|
||
### Run without compilation (Baseline) | ||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model runwayml/stable-diffusion-v1-5 \ | ||
--height 512 --width 512 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-v1-5.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler none \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
### Run with compilation | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model runwayml/stable-diffusion-v1-5 \ | ||
--height 512 --width 512 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-v1-5-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "cudagraphs:benchmark:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}}' \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
## Performance comparison | ||
|
||
Testing on NVIDIA GeForce RTX 3090 / 4090, with image size of 512*512, iterating 20 steps: | ||
| Metric | RTX3090, 512*512 | RTX4090, 512*512 | | ||
| ------------------------------------ | --------------------- | --------------------- | | ||
| Data update date (yyyy-mm-dd) | 2024-07-10 | 2024-07-10 | | ||
| PyTorch iteration speed | 21.20 it/s | 34.46 it/s | | ||
| OneDiff iteration speed | 48.00 it/s (+126.4%) | 81.81 it/s (+137.4%) | | ||
| PyTorch E2E time | 1.07 s | 0.67 s | | ||
| OneDiff E2E time | 0.48 s (-55.1%) | 0.28 s (-58.2%) | | ||
| PyTorch Max Mem Used | 2.627 GiB | 2.616 GiB | | ||
| OneDiff Max Mem Used | 2.587 GiB | 2.709 GiB | | ||
| PyTorch Warmup with Run time | | | | ||
| OneDiff Warmup with Compilation time | 233.61 s <sup>1</sup> | 177.321s <sup>2</sup> | | ||
| OneDiff Warmup with Cache time | 41.120 s | 30.019s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
<sup>2</sup> AMD EPYC 7543 32-Core Processor. | ||
|
||
## Dynamic shape for SD1.5 | ||
|
||
<!-- TODO --> | ||
|
||
Run: | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model runwayml/stable-diffusion-v1-5 \ | ||
--height 512 --width 512 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-v1-5-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}, "dynamic": true}' \ | ||
--run_multiple_resolutions 1 | ||
``` | ||
|
||
## Quality | ||
When using nexfort as the backend for onediff compilation acceleration, the generated images are lossless. | ||
|
||
<p align="center"> | ||
<img src="../../../imgs/nexfort_sd1-5_demo.png"> | ||
</p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# Run SD2 with nexfort backend (Beta Release) | ||
|
||
1. [Environment Setup](#environment-setup) | ||
- [Set Up OneDiff](#set-up-onediff) | ||
- [Set Up NexFort Backend](#set-up-nexfort-backend) | ||
- [Set Up Diffusers Library](#set-up-diffusers) | ||
- [Set Up SD2](#set-up-sd2) | ||
2. [Execution Instructions](#run) | ||
- [Run Without Compilation (Baseline)](#run-without-compilation-baseline) | ||
- [Run With Compilation](#run-with-compilation) | ||
3. [Performance Comparison](#performance-comparison) | ||
4. [Dynamic Shape for SD2](#dynamic-shape-for-sd2) | ||
5. [Quality](#quality) | ||
|
||
## Environment setup | ||
### Set up onediff | ||
https://github.com/siliconflow/onediff?tab=readme-ov-file#installation | ||
|
||
### Set up nexfort backend | ||
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort | ||
|
||
### Set up diffusers | ||
|
||
``` | ||
pip3 install --upgrade diffusers[torch] | ||
``` | ||
### Set up SD2 | ||
Model version for diffusers: https://huggingface.co/stabilityai/stable-diffusion-2 | ||
|
||
HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_2.md | ||
|
||
## Run | ||
|
||
### Run without compilation (Baseline) | ||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-2-1 \ | ||
--height 768 --width 768 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-2-1.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler none \ | ||
--print-output | ||
``` | ||
|
||
### Run with compilation | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-2-1 \ | ||
--height 768 --width 768 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-2-1-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "cudagraphs:benchmark:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"triton.fuse_attention_allow_fp16_reduction": false, "inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}}' \ | ||
--print-output | ||
``` | ||
|
||
## Performance comparison | ||
|
||
Testing on NVIDIA GeForce RTX 3090 / 4090, with image size of 786\*768 and 512\*512, iterating 20 steps: | ||
|
||
| Metric | RTX3090, 768*768 | RTX3090, 512*512 | RTX4090, 768*768 | RTX4090, 512*512 | | ||
| ------------------------------------ | -------------------- | -------------------- | --------------------- | --------------------- | | ||
| Data update date (yyyy-mm-dd) | 2024-07-10 | 2024-07-10 | 2024-07-10 | 2024-07-10 | | ||
| PyTorch iteration speed | 10.45 it/s | 22.84 it/s | 12.34 it/s | 39.06 it/s | | ||
| OneDiff iteration speed | 15.93 it/s (+52.4%) | 44.84 it/s (+96.3%) | 31.63 it/s (+156.3%) | 83.63 it/s (+114.1%) | | ||
| PyTorch E2E time | 2.10 s | 0.97 s | 1.78s | 0.58 s | | ||
| OneDiff E2E time | 1.35 s (-35.7%) | 0.49 s (-49.5%) | 0.68s (-61.8%) | 0.26 s (-55.2%) | | ||
| PyTorch Max Mem Used | 3.767 GiB | 3.025 GiB | 3.767 GiB | 3.024 GiB | | ||
| OneDiff Max Mem Used | 3.558 GiB | 3.018 GiB | 3.567 GiB | 3.016 GiB | | ||
| PyTorch Warmup with Run time | | | | | | ||
| OneDiff Warmup with Compilation time | 301.54 s<sup>1</sup> | 222.18 s<sup>1</sup> | 195.34 s <sup>2</sup> | 165.29 s <sup>1</sup> | | ||
| OneDiff Warmup with Cache time | 113.04 s | 44.94 s | 32.41 s | 30.10 s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
<sup>2</sup> AMD EPYC 7543 32-Core Processor. | ||
|
||
## Dynamic shape for SD2 | ||
|
||
Run: | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-2-1 \ | ||
--height 768 --width 768 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-2-1-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}, "dynamic": true}' \ | ||
--run_multiple_resolutions 1 | ||
``` | ||
|
||
## Quality | ||
When using nexfort as the backend for onediff compilation acceleration, the generated images are lossless. | ||
|
||
<p align="center"> | ||
<img src="../../../imgs/nexfort_sd2_demo.png"> | ||
</p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Run SDXL with nexfort backend (Beta Release) | ||
|
||
1. [Environment Setup](#environment-setup) | ||
- [Set Up OneDiff](#set-up-onediff) | ||
- [Set Up NexFort Backend](#set-up-nexfort-backend) | ||
- [Set Up Diffusers Library](#set-up-diffusers) | ||
- [Set Up SDXL](#set-up-sdxl) | ||
2. [Execution Instructions](#run) | ||
- [Run Without Compilation (Baseline)](#run-without-compilation-baseline) | ||
- [Run With Compilation](#run-with-compilation) | ||
3. [Performance Comparison](#performance-comparison) | ||
4. [Dynamic Shape for SDXL](#dynamic-shape-for-sdxl) | ||
5. [Quality](#quality) | ||
|
||
## Environment setup | ||
### Set up onediff | ||
https://github.com/siliconflow/onediff?tab=readme-ov-file#installation | ||
|
||
### Set up nexfort backend | ||
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort | ||
|
||
### Set up diffusers | ||
|
||
``` | ||
pip3 install --upgrade diffusers[torch] | ||
``` | ||
### Set up SDXL | ||
Model version for diffusers: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 | ||
|
||
HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md | ||
|
||
## Run | ||
|
||
### Run without compilation (Baseline) | ||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-xl-base-1.0 \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-xl.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler none \ | ||
--variant fp16 \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
### Run with compilation | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-xl-base-1.0 \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-xl-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "benchmark:cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}}' \ | ||
--variant fp16 \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
## Performance comparison | ||
|
||
Testing on NVIDIA GeForce RTX 3090 / 4090, with image size of 1024*1024, iterating 20 steps: | ||
| Metric | RTX 3090 1024*1024 | RTX 4090 1024*1024 | | ||
| ------------------------------------ | --------------------- | --------------------- | | ||
| Data update date (yyyy-mm-dd) | 2024-07-10 | 2024-07-10 | | ||
| PyTorch iteration speed | 4.08 it/s | 6.93 it/s | | ||
| OneDiff iteration speed | 7.21 it/s (+76.7%) | 13.92 it/s (+100.9%) | | ||
| PyTorch E2E time | 5.60 s | 3.23 s | | ||
| OneDiff E2E time | 3.41 s (-39.1%) | 1.67 s (-48.3%) | | ||
| PyTorch Max Mem Used | 10.467 GiB | 10.467 GiB | | ||
| OneDiff Max Mem Used | 12.004 GiB | 12.021 GiB | | ||
| PyTorch Warmup with Run time | | | | ||
| OneDiff Warmup with Compilation time | 474.36 s <sup>1</sup> | 236.54 s <sup>2</sup> | | ||
| OneDiff Warmup with Cache time | 306.84 s | 104.57 s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
<sup>2</sup> AMD EPYC 7543 32-Core Processor. | ||
|
||
|
||
## Dynamic shape for SDXL | ||
|
||
Run: | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model stabilityai/stable-diffusion-xl-base-1.0 \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 20 \ | ||
--output-image ./stable-diffusion-xl-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"inductor.optimize_linear_epilogue": false, "overrides.conv_benchmark": true, "overrides.matmul_allow_tf32": true}, "dynamic": true}' \ | ||
--run_multiple_resolutions 1 | ||
``` | ||
|
||
## Quality | ||
When using nexfort as the backend for onediff compilation acceleration, the generated images are lossless. | ||
|
||
<p align="center"> | ||
<img src="../../../imgs/nexfort_sdxl_demo.png"> | ||
</p> |