Skip to content

fix: prelu perf gap on Unet #3717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -1093,7 +1093,7 @@ def aten_ops_clone_copy_dtype(
name,
args[0],
kwargs.get("dtype", args[0].dtype),
force_layer=True,
force_layer=False,
)


Expand Down
9 changes: 8 additions & 1 deletion py/torch_tensorrt/dynamo/conversion/impl/prelu.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from torch.fx.node import Target
from torch_tensorrt.dynamo._SourceIR import SourceIR
from torch_tensorrt.dynamo.conversion import impl
from torch_tensorrt.dynamo.conversion._ConversionContext import ConversionContext
from torch_tensorrt.dynamo.conversion.converter_utils import set_layer_name
from torch_tensorrt.dynamo.types import TRTTensor
Expand All @@ -15,6 +16,12 @@ def prelu(
input: TRTTensor,
weight: TRTTensor,
) -> TRTTensor:
layer = ctx.net.add_parametric_relu(input, weight)
# TRT requires that the slopes tensor must be unidirectional broadcastable to the input tensor:
# the rank of the two tensors must be the same, and all dimensions of the slopes tensor must
# either equal the input tensor or be 1. The output tensor has the same shape as the input tensor.
input, weight = impl.elementwise.broadcast(
ctx, input, weight, f"{name}_broadcast_input", f"{name}_broadcast_weight"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably redundant because PyTorch already reshapes the weight beforehand so we already receive the weight of the required shape.

import torch
import torch_tensorrt


class MyModule(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, x: torch.Tensor, weight: torch.Tensor) -> torch.Tensor:
        return torch.nn.functional.prelu(x, weight)


with torch.inference_mode():
    model = MyModule().eval().cuda().half()

    inputs = (
        torch.randn(1, 3, 224, 224, dtype=torch.half, device="cuda"),
        torch.randn(3, dtype=torch.half, device="cuda"),
    )

    exported_program = torch.export.export(model, inputs)

    with torch_tensorrt.dynamo.Debugger():
        trt_model = torch_tensorrt.dynamo.compile(
            exported_program, inputs, enabled_precisions={torch.half}, min_block_size=1
        )

    torch.testing.assert_close(trt_model(*inputs), model(*inputs), rtol=5e-3, atol=5e-3)
10:39:52 - DEBUG - Invoking DynamoPassManager and applying lowering passes: [<function remove_detach at 0x000001A0385E4EA0>]
10:39:52 - DEBUG - Removed 0 detach nodes:
graph():
    %x : [num_users=1] = placeholder[target=x]
    %weight : [num_users=1] = placeholder[target=weight]
    %prelu : [num_users=1] = call_function[target=torch.ops.aten.prelu.default](args = (%x, %weight), kwargs = {})
    return (prelu,)
10:39:52 - DEBUG - Input graph: graph():
    %x : [num_users=1] = placeholder[target=x]
    %weight : [num_users=1] = placeholder[target=weight]
    %_reshape_copy : [num_users=1] = call_function[target=torch.ops.aten._reshape_copy.default](args = (%weight, [1, 3, 1, 1]), kwargs = {})
    %_prelu_kernel : [num_users=1] = call_function[target=torch.ops.aten._prelu_kernel.default](args = (%x, %_reshape_copy), kwargs = {})
    return (_prelu_kernel,)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case that _prelu_kernel is directly called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you still get the wrong shape of weight because broadcast prepend 1s to the beginning and hence you get (1, 1, 1, 3) rather than (1, 3, 1, 1) with the above example.

layer = ctx.net.add_parametric_relu(input, slopes=weight)
set_layer_name(layer, target, name, source_ir)
return layer.get_output(0)
Loading