Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on compilation time of boolean pipeline #1273

Open
winnylyc opened this issue Jan 17, 2025 · 6 comments
Open

Question on compilation time of boolean pipeline #1273

winnylyc opened this issue Jan 17, 2025 · 6 comments

Comments

@winnylyc
Copy link

winnylyc commented Jan 17, 2025

Hello, sorry for disturbing you.

I am testing on a simple multiplication case from standard MLIR following the setting there #1175 .

When I have the input with types int32, the compilation time is 155m4.853s.

Here is the test case:

func.func @main(%arg0: tensor<3x4xi32>, %arg1: tensor<4x5xi32>) -> tensor<3x5xi32> {
    %0 = tensor.empty() : tensor<3x5xi32>
    %cst = arith.constant 0 : i32
    %1 = linalg.fill ins(%cst : i32) outs(%0 : tensor<3x5xi32>) -> tensor<3x5xi32>
    %2 = linalg.matmul ins(%arg0, %arg1 : tensor<3x4xi32>, tensor<4x5xi32>) outs(%1 : tensor<3x5xi32>) -> tensor<3x5xi32>
    return %2 : tensor<3x5xi32>
}

The command for compilation is:
time bazel run //tools:heir-opt -- --tosa-to-boolean-tfhe $PWD/new_test_i32.mlir &> output_i32.mlir

I also tested on different data types:

  • int8: 1m17.213s
  • int16: 14m40.698s
  • int32: 155m4.853s

My Questions:

  1. Is this compilation speed expected for int32 inputs?
  2. If so, are there any trade-offs available, such as disabling certain optimization processes to reduce compilation time in a great deal, even if it might result in longer runtime?

Thank you for your help!

@asraa
Copy link
Collaborator

asraa commented Jan 17, 2025

Hey! The int32 time seems fairly long, I'll repro on my device too and let you know how it goes.

If so, are there any trade-offs available, such as disabling certain optimization processes to reduce compilation time in a great deal, even if it might result in longer runtime?

Yes, if you pass the --tosa-to-boolean-tfhe=abc-fast=true then abc (the logic optimizer that runs within the yosys process) will be run with -fast mode and that will likely help speed it up a lot. Running explicitly with bazel run -c opt may also help since by default we use debug mode and that might enable some slower processing depending on the debug only code.

I'm also wondering if maybe reordering some of the canonicalizations after linalg is lowered or fusing loops will also help to rely more on MLIR's simplifications (CSE / etc) rather than yosys.

@winnylyc
Copy link
Author

Thank you for your advice! I will test them out on my device!

@winnylyc
Copy link
Author

Running explicitly with bazel run -c opt may also help since by default we use debug mode and that might enable some slower processing depending on the debug only code.

This advice works! Explicitly running with bazel run -c opt reduces the time cost effectively! Sorry for making such naive mistake.😂
time bazel run -c opt//tools:heir-opt -- --tosa-to-boolean-tfhe $PWD/new_test_i32.mlir &> output_i32.mlir

The current time cost is: 13m28.084s

I will further test other suggestion.

@winnylyc
Copy link
Author

if you pass the --tosa-to-boolean-tfhe=abc-fast=true then abc (the logic optimizer that runs within the yosys process) will be run with -fast mode and that will likely help speed it up a lot.

I further test on the abc-fast function, but it seems to only perform well on short integer length.
I run the following test with 5 times to get the average time cost.

  • int8:
    original = 7.620 seconds
    abc-fast = 6.391 seconds

  • int16:
    original = 67.062 seconds
    abc-fast = 65.238 seconds

  • int32:
    original = 838.661 seconds
    abc-fast = 854.410 seconds

For int32, abc-fast seems to even slow down the compilation a little bit.
Do these results meet expectations?
Additionally, I would like to confirm whether yosys is the most time-consuming component.

@winnylyc
Copy link
Author

Do these results meet expectations?

I further test on the time cost of yosys without/with abc-fast for int32.
original = 83.812 seconds
abc-fast = 61.697 seconds
It shows that abc-fast do speed yosys up a lot, however other lowering procedure slows down because of abc-fast.

I would like to confirm whether yosys is the most time-consuming component.

It seems that yosys dominates the compilation process for int8 data types but plays a much smaller role for int32.

  • int8:
    yosys = 4.859 seconds
    all = 7.620 seconds

  • int32:
    yosys = 83.812 seconds
    all = 838.661 seconds

I still wondering what happens in boolean pipeline to makes it slow. I will further explore it more. If you have any suggestions, I'd really appreciate you sharing them with me! Thanks so much for your help!

@winnylyc
Copy link
Author

In addition, I think maybe int32 is not necessary in most cases. Focusing on the pipeline for int8 should be the better choice. Do you agree on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants