[Feedback] Benchmarking a more advanced function #2832

sayakpaul · 2023-12-21T05:39:46Z

sayakpaul
Dec 21, 2023

I love the benchmarking process as shown here: https://triton-lang.org/main/getting-started/tutorials/02-fused-softmax.html.

I was trying to come up with a benchmarking script to benchmark a diffusion pipeline with the utilities shown in that tutorial. My script is like so:

from diffusers import DiffusionPipeline
import torch 
import triton


pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None
)
pipeline = pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=True)

# warm up
for _ in range(2):
    _ = pipeline("a picture of a cat", num_inference_steps=10)


def run_inference(num_inference_steps=25):
    _ = pipeline("a picture of a cat", num_inference_steps=num_inference_steps)


pipeline.unet = torch.compile(pipeline.unet, mode="max-autotune", fullgraph=True)
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)

# warm up
for _ in range(2):
    _ = pipeline("a picture of a cat", num_inference_steps=10)

def run_inference_with_compile(num_inference_steps=25):
    _ = pipeline("a picture of a cat", num_inference_steps=num_inference_steps)

@triton.testing.perf_report(
    triton.testing.Benchmark(
        x_names=['Steps'],  # argument names to use as an x-axis for the plot
        x_vals=list(range(10, 60, 10)),  # different possible values for `x_name`
        line_arg='do_compile',  # argument name whose value corresponds to a different line in the plot
        line_vals=[
            'no-compile',
            'compiled',
        ],  # possible values for `line_arg`
        line_names=[
            "Not Compiled",
            "Compiled",
        ],  # label name for the lines
        styles=[('blue', '-'), ('green', '-'), ('green', '--')],  # line styles
        ylabel="Total Time",  # label name for the y-axis
        plot_name="torch.compile_performance",  # name for the plot. Used also as a file name for saving the plot.
        args={}
    ))
def benchmark(Steps, do_compile):
    quantiles = [0.5, 0.2, 0.8]
    if do_compile == 'no-compile':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: run_inference(num_inference_steps=Steps), quantiles=quantiles)
    if do_compile == 'compiled':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: run_inference_with_compile(num_inference_steps=Steps), quantiles=quantiles)
    return ms, max_ms, min_ms


benchmark.run(show_plots=True, print_data=True)

This is the result I am getting:

   Steps  Not Compiled    Compiled
0   10.0    230.683075  230.177444
1   20.0    409.250488  409.395264
2   30.0    588.396545  588.461914
3   40.0    766.835754  767.038757
4   50.0    945.467163  945.638062

This seems odd as torch.compile() should improve the performance (which I have rested in isolation).

For the diffusers dependency, install pip install diffusers accelerate transformers.

Is this the right way to do it?

sayakpaul · 2023-12-21T05:55:19Z

sayakpaul
Dec 21, 2023
Author

Another question is how do I save the resulting plot?

0 replies

sayakpaul · 2023-12-21T09:27:17Z

sayakpaul
Dec 21, 2023
Author

Okay so with the following code, I think I now have a better handle:

from diffusers import DiffusionPipeline
import torch 
import triton


pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None
)
pipeline = pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=True)

def run_inference(num_inference_steps=25):
    _ = pipeline("a picture of a cat", num_inference_steps=num_inference_steps)

pipeline.unet = torch.compile(pipeline.unet, mode="max-autotune", fullgraph=True)
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)

def run_inference_with_compile(num_inference_steps=25):
    _ = pipeline("a picture of a cat", num_inference_steps=num_inference_steps)

@triton.testing.perf_report(
    triton.testing.Benchmark(
        x_names=['Steps'],  # argument names to use as an x-axis for the plot
        x_vals=list(range(10, 60, 10)),  # different possible values for `x_name`
        line_arg='do_compile',  # argument name whose value corresponds to a different line in the plot
        line_vals=[
            'no-compile',
            'compiled',
        ],  # possible values for `line_arg`
        line_names=[
            "Not Compiled",
            "Compiled",
        ],  # label name for the lines
        styles=[('blue', '-'), ('green', '-'), ('green', '--')],  # line styles
        ylabel="Total Time",  # label name for the y-axis
        plot_name="torch.compile_performance",  # name for the plot. Used also as a file name for saving the plot.
        args={}
    ))
def benchmark(Steps, do_compile):
    if do_compile == 'no-compile':
        min = triton.testing.do_bench(lambda: run_inference(num_inference_steps=Steps)) 
    if do_compile == 'compiled':
        min = triton.testing.do_bench(lambda: run_inference_with_compile(num_inference_steps=Steps))
    return min / 1e3


benchmark.run(show_plots=True, print_data=True, save_path=".")

With the save_path specified, I get three files: HTML, csv, and a plot. But the HTML and the plot are all blank. The CSV is correct, though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feedback] Benchmarking a more advanced function #2832

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[Feedback] Benchmarking a more advanced function #2832

sayakpaul Dec 21, 2023

Replies: 2 comments

sayakpaul Dec 21, 2023 Author

sayakpaul Dec 21, 2023 Author

sayakpaul
Dec 21, 2023

sayakpaul
Dec 21, 2023
Author

sayakpaul
Dec 21, 2023
Author