Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM vs C backend performance #3192

Closed
1 task done
hargoniX opened this issue Jan 17, 2024 · 2 comments
Closed
1 task done

LLVM vs C backend performance #3192

hargoniX opened this issue Jan 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@hargoniX
Copy link
Contributor

Prerequisites

  • Put an X between the brackets on this line if you have done all of the following:
    • Check that your issue is not already filed.
    • Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to mathlib4 or std4.

Description

Currently the LLVM backend generates IR that has worse performance than the C backend after optimization. It appears that this issue can be deduced to a failure of inlining Lean runtime
functions for manipulating lean_objects.

Steps to Reproduce

  1. Compile Lean with the C and the LLVM backend enabled
  2. Inspect the IR of the const_fold.lean bench test generated by LLVM vs generated by C before and after -O3
  3. Some lean.h functions will be inlined in the C variant but not in the LLVM variant

Expected behavior: The same functions should be inlined

Actual behavior: They are not

Impact

As long as this is not fixed the LLVM backend will most likely not be able to be competetive with the C one in terms of performance.

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@hargoniX hargoniX added the bug Something isn't working label Jan 17, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2024
Debugged and authored in collaboration with @bollu.

This PR fixes several performance regressions of the LLVM backend
compared to the C backend
as described in #3192. We are now at the point where some benchmarks
from `tests/bench` achieve consistently equal and sometimes ever so
slightly better performance when using LLVM instead of C. However there
are still a few testcases where we are lacking behind ever so slightly.

The PR contains two changes:
1. Using the same types for `lean.h` runtime functions in the LLVM
backend as in `lean.h` it turns out that:
a) LLVM does not throw an error if we declare a function with a
different type than it actually has. This happened on multiple occasions
here, in particular when the function used `unsigned`, as it was
wrongfully assumed to be `size_t` sized.
b) Refuses to inline a function to the call site if such a type mismatch
occurs. This means that we did not inline important functionality such
as `lean_ctor_set` and were thus slowed down compared to the C backend
which did this correctly.
2. While developing this change we noticed that LLVM does treat the
following as invalid: Having a function declared with a certain type but
called with integers of a different type. However this will manifest in
completely nonsensical errors upon optimizing the bitcode file through
`leanc` such as:
```
error: Invalid record (Producer: 'LLVM15.0.7' Reader: 'LLVM 15.0.7')
```
Presumably because the generate .bc file is invalid in the first place.
Thus we added a call to `LLVMVerifyModule` before serializing the module
into a bitcode file. This ended producing the expected type errors from
LLVM an aborting the bitcode file generation as expected.

We manually checked each function in `lean.h` that is mentioned in
`EmitLLVM.lean` to make sure that all of their types align correctly
now.

Quick overview of the fast benchmarks as measured on my machine, 2 runs
of LLVM and 2 runs of C to get a feeling for how far the averages move:
- binarytrees: basically equal performance
- binarytrees.st: basically equal performance
- const_fold: equal if not slightly better for LLVM
- deriv: LLVM has 8% more instructions than C but same wall clock time
- liasolver: basically equal performance
- qsort: LLVM is slower by 7% instructions, 4% time. We have identified
why the generated code is slower (there is a store/load in a hot loop in
LLVM that is not in C) but not figured out why that happens/how to
address it.
- rbmap: LLVM has 3% less instructions and 13% less wall-clock time than
C (woop woop)
- rbmap_1 and rbmap_10 show similar behavior
- rbmap_fbip: LLVM has 2% more instructions but 2% better wall time
- rbmap_library: equal if not slightly better for LLVM
- unionfind: LLVM has 5% more instructions but 4% better wall time

Leaving out benchmarks related to the compiler itself as I was too lazy
to keep recompiling it from scratch until we are on a level with C.

Summing things up, it appears that LLVM has now caught up or surpassed
the C backend in the microbenchmarks for the most part. Next steps from
our side are:
- trying to win the qsort benchmark
- figuring out why/how LLVM runs more instructions for less wall-clock
time. My current guesses would be measurement noise and/or better use of
micro architecture?
- measuring the larger benchmarks as well
@Kha
Copy link
Member

Kha commented Sep 20, 2024

@hargoniX In #3225 it sounds like this specific issue is likely resolved, yes?

@hargoniX
Copy link
Contributor Author

Yes, in general the performance diff between the LLVM and C backend is very miniscule but still measurable. We investigated the IR for a while and didn't really see any obvious differences in the logic it's implementing so the remaining diff is most likely due to different basic blocks, binary layouts etc.

@Kha Kha closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants