-
Notifications
You must be signed in to change notification settings - Fork 450
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: type mismatches in the LLVM backend (#3225)
Debugged and authored in collaboration with @bollu. This PR fixes several performance regressions of the LLVM backend compared to the C backend as described in #3192. We are now at the point where some benchmarks from `tests/bench` achieve consistently equal and sometimes ever so slightly better performance when using LLVM instead of C. However there are still a few testcases where we are lacking behind ever so slightly. The PR contains two changes: 1. Using the same types for `lean.h` runtime functions in the LLVM backend as in `lean.h` it turns out that: a) LLVM does not throw an error if we declare a function with a different type than it actually has. This happened on multiple occasions here, in particular when the function used `unsigned`, as it was wrongfully assumed to be `size_t` sized. b) Refuses to inline a function to the call site if such a type mismatch occurs. This means that we did not inline important functionality such as `lean_ctor_set` and were thus slowed down compared to the C backend which did this correctly. 2. While developing this change we noticed that LLVM does treat the following as invalid: Having a function declared with a certain type but called with integers of a different type. However this will manifest in completely nonsensical errors upon optimizing the bitcode file through `leanc` such as: ``` error: Invalid record (Producer: 'LLVM15.0.7' Reader: 'LLVM 15.0.7') ``` Presumably because the generate .bc file is invalid in the first place. Thus we added a call to `LLVMVerifyModule` before serializing the module into a bitcode file. This ended producing the expected type errors from LLVM an aborting the bitcode file generation as expected. We manually checked each function in `lean.h` that is mentioned in `EmitLLVM.lean` to make sure that all of their types align correctly now. Quick overview of the fast benchmarks as measured on my machine, 2 runs of LLVM and 2 runs of C to get a feeling for how far the averages move: - binarytrees: basically equal performance - binarytrees.st: basically equal performance - const_fold: equal if not slightly better for LLVM - deriv: LLVM has 8% more instructions than C but same wall clock time - liasolver: basically equal performance - qsort: LLVM is slower by 7% instructions, 4% time. We have identified why the generated code is slower (there is a store/load in a hot loop in LLVM that is not in C) but not figured out why that happens/how to address it. - rbmap: LLVM has 3% less instructions and 13% less wall-clock time than C (woop woop) - rbmap_1 and rbmap_10 show similar behavior - rbmap_fbip: LLVM has 2% more instructions but 2% better wall time - rbmap_library: equal if not slightly better for LLVM - unionfind: LLVM has 5% more instructions but 4% better wall time Leaving out benchmarks related to the compiler itself as I was too lazy to keep recompiling it from scratch until we are on a level with C. Summing things up, it appears that LLVM has now caught up or surpassed the C backend in the microbenchmarks for the most part. Next steps from our side are: - trying to win the qsort benchmark - figuring out why/how LLVM runs more instructions for less wall-clock time. My current guesses would be measurement noise and/or better use of micro architecture? - measuring the larger benchmarks as well
- Loading branch information
Showing
3 changed files
with
79 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.