Question on compilation time of boolean pipeline #1273

winnylyc · 2025-01-17T03:30:33Z

Hello, sorry for disturbing you.

I am testing on a simple multiplication case from standard MLIR following the setting there #1175 .

When I have the input with types int32, the compilation time is 155m4.853s.

Here is the test case:

func.func @main(%arg0: tensor<3x4xi32>, %arg1: tensor<4x5xi32>) -> tensor<3x5xi32> {
    %0 = tensor.empty() : tensor<3x5xi32>
    %cst = arith.constant 0 : i32
    %1 = linalg.fill ins(%cst : i32) outs(%0 : tensor<3x5xi32>) -> tensor<3x5xi32>
    %2 = linalg.matmul ins(%arg0, %arg1 : tensor<3x4xi32>, tensor<4x5xi32>) outs(%1 : tensor<3x5xi32>) -> tensor<3x5xi32>
    return %2 : tensor<3x5xi32>
}

The command for compilation is:
time bazel run //tools:heir-opt -- --tosa-to-boolean-tfhe $PWD/new_test_i32.mlir &> output_i32.mlir

I also tested on different data types:

int8: 1m17.213s
int16: 14m40.698s
int32: 155m4.853s

My Questions:

Is this compilation speed expected for int32 inputs?
If so, are there any trade-offs available, such as disabling certain optimization processes to reduce compilation time in a great deal, even if it might result in longer runtime?

Thank you for your help!

The text was updated successfully, but these errors were encountered:

asraa · 2025-01-17T14:21:08Z

Hey! The int32 time seems fairly long, I'll repro on my device too and let you know how it goes.

If so, are there any trade-offs available, such as disabling certain optimization processes to reduce compilation time in a great deal, even if it might result in longer runtime?

Yes, if you pass the --tosa-to-boolean-tfhe=abc-fast=true then abc (the logic optimizer that runs within the yosys process) will be run with -fast mode and that will likely help speed it up a lot. Running explicitly with bazel run -c opt may also help since by default we use debug mode and that might enable some slower processing depending on the debug only code.

I'm also wondering if maybe reordering some of the canonicalizations after linalg is lowered or fusing loops will also help to rely more on MLIR's simplifications (CSE / etc) rather than yosys.

winnylyc · 2025-01-17T14:32:07Z

Thank you for your advice! I will test them out on my device!

winnylyc · 2025-01-17T15:21:28Z

Running explicitly with bazel run -c opt may also help since by default we use debug mode and that might enable some slower processing depending on the debug only code.

This advice works! Explicitly running with bazel run -c opt reduces the time cost effectively! Sorry for making such naive mistake.😂
time bazel run -c opt//tools:heir-opt -- --tosa-to-boolean-tfhe $PWD/new_test_i32.mlir &> output_i32.mlir

The current time cost is: 13m28.084s

I will further test other suggestion.

winnylyc · 2025-01-19T06:35:49Z

if you pass the --tosa-to-boolean-tfhe=abc-fast=true then abc (the logic optimizer that runs within the yosys process) will be run with -fast mode and that will likely help speed it up a lot.

I further test on the abc-fast function, but it seems to only perform well on short integer length.
I run the following test with 5 times to get the average time cost.

int8:
original = 7.620 seconds
abc-fast = 6.391 seconds
int16:
original = 67.062 seconds
abc-fast = 65.238 seconds
int32:
original = 838.661 seconds
abc-fast = 854.410 seconds

For int32, abc-fast seems to even slow down the compilation a little bit.
Do these results meet expectations?
Additionally, I would like to confirm whether yosys is the most time-consuming component.

winnylyc · 2025-01-21T02:44:24Z

Do these results meet expectations?

I further test on the time cost of yosys without/with abc-fast for int32.
original = 83.812 seconds
abc-fast = 61.697 seconds
It shows that abc-fast do speed yosys up a lot, however other lowering procedure slows down because of abc-fast.

I would like to confirm whether yosys is the most time-consuming component.

It seems that yosys dominates the compilation process for int8 data types but plays a much smaller role for int32.

int8:
yosys = 4.859 seconds
all = 7.620 seconds
int32:
yosys = 83.812 seconds
all = 838.661 seconds

I still wondering what happens in boolean pipeline to makes it slow. I will further explore it more. If you have any suggestions, I'd really appreciate you sharing them with me! Thanks so much for your help!

winnylyc · 2025-01-21T03:14:41Z

In addition, I think maybe int32 is not necessary in most cases. Focusing on the pipeline for int8 should be the better choice. Do you agree on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on compilation time of boolean pipeline #1273

Question on compilation time of boolean pipeline #1273

winnylyc commented Jan 17, 2025 •

edited

Loading

asraa commented Jan 17, 2025 •

edited

Loading

winnylyc commented Jan 17, 2025

winnylyc commented Jan 17, 2025

winnylyc commented Jan 19, 2025

winnylyc commented Jan 21, 2025

winnylyc commented Jan 21, 2025

Question on compilation time of boolean pipeline #1273

Question on compilation time of boolean pipeline #1273

Comments

winnylyc commented Jan 17, 2025 • edited Loading

asraa commented Jan 17, 2025 • edited Loading

winnylyc commented Jan 17, 2025

winnylyc commented Jan 17, 2025

winnylyc commented Jan 19, 2025

winnylyc commented Jan 21, 2025

winnylyc commented Jan 21, 2025

winnylyc commented Jan 17, 2025 •

edited

Loading

asraa commented Jan 17, 2025 •

edited

Loading