fix: type mismatches in the LLVM backend #3225

hargoniX · 2024-01-27T23:22:43Z

Debugged and authored in collaboration with @bollu.

This PR fixes several performance regressions of the LLVM backend compared to the C backend
as described in #3192. We are now at the point where some benchmarks from tests/bench achieve consistently equal and sometimes ever so slightly better performance when using LLVM instead of C. However there are still a few testcases where we are lacking behind ever so slightly.

The PR contains two changes:

Using the same types for lean.h runtime functions in the LLVM backend as in lean.h it turns out that:
a) LLVM does not throw an error if we declare a function with a different type than it actually has. This happened on multiple occasions here, in particular when the function used unsigned, as it was wrongfully assumed to be size_t sized.
b) Refuses to inline a function to the call site if such a type mismatch occurs. This means that we did not inline important functionality such as lean_ctor_set and were thus slowed down compared to the C backend which did this correctly.
While developing this change we noticed that LLVM does treat the following as invalid: Having a function declared with a certain type but called with integers of a different type. However this will manifest in completely nonsensical errors upon optimizing the bitcode file through leanc such as:

error: Invalid record (Producer: 'LLVM15.0.7' Reader: 'LLVM 15.0.7')

Presumably because the generate .bc file is invalid in the first place. Thus we added a call to LLVMVerifyModule before serializing the module into a bitcode file. This ended producing the expected type errors from LLVM an aborting the bitcode file generation as expected.

We manually checked each function in lean.h that is mentioned in EmitLLVM.lean to make sure that all of their types align correctly now.

Quick overview of the fast benchmarks as measured on my machine, 2 runs of LLVM and 2 runs of C to get a feeling for how far the averages move:

binarytrees: basically equal performance
binarytrees.st: basically equal performance
const_fold: equal if not slightly better for LLVM
deriv: LLVM has 8% more instructions than C but same wall clock time
liasolver: basically equal performance
qsort: LLVM is slower by 7% instructions, 4% time. We have identified why the generated code is slower (there is a store/load in a hot loop in LLVM that is not in C) but not figured out why that happens/how to address it.
rbmap: LLVM has 3% less instructions and 13% less wall-clock time than C (woop woop)
rbmap_1 and rbmap_10 show similar behavior
rbmap_fbip: LLVM has 2% more instructions but 2% better wall time
rbmap_library: equal if not slightly better for LLVM
unionfind: LLVM has 5% more instructions but 4% better wall time

Leaving out benchmarks related to the compiler itself as I was too lazy to keep recompiling it from scratch until we are on a level with C.

Summing things up, it appears that LLVM has now caught up or surpassed the C backend in the microbenchmarks for the most part. Next steps from our side are:

trying to win the qsort benchmark
figuring out why/how LLVM runs more instructions for less wall-clock time. My current guesses would be measurement noise and/or better use of micro architecture?
measuring the larger benchmarks as well

leanprover-community-mathlib4-bot · 2024-01-27T23:48:19Z

Mathlib CI status (docs):

❗ Std CI can not be attempted yet, as the nightly-testing-2024-01-27 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Std CI should run now. (2024-01-27 23:48:19)

digama0 · 2024-01-28T02:17:15Z

Does LLVM have a typechecker, and could we use it to validate the output in the tests?

hargoniX · 2024-01-28T08:28:59Z

The strongest thing that I know of is the mentioned validator that we now run per default on the .bc file to avoid producing a corrupted file. Maybe @bollu or @tobiasgrosser know of some additional tool that we could run?

tobiasgrosser · 2024-01-28T13:09:59Z

@hargoniX, how did your flow look like before you started to run the validator? LLVM always calls the validator when loading a .bc or .ll file into any tool. I am surprised you did not run the validator yourself as part of your flow, as invalid IR is very likely to cause issues.

As allowing for invalid IR at an intermediate stage makes writing transformations easier and running the validator after each pass has a performance cost, LLVM does not run the validator after each pass. However, when debugging its really useful to run the validator.

hargoniX · 2024-01-28T13:32:14Z

LLVM always calls the validator when loading a .bc or .ll file into any tool.

This statement confuses me a little bit. Once we noticed the type mismatches we changed the types at the declaration of the function types but we forgot to also change the types of the numeric literals that are passed to the functions. This ended up producing things like the above:

error: Invalid record (Producer: 'LLVM15.0.7' Reader: 'LLVM 15.0.7')

When trying to run the optimization step on the produced .bc file with leanc. But upon adding the call to LLVMVerifyModule it produced proper type errors while emitting the .bc file in the first place, detailing which functions were expecting what types etc.

tobiasgrosser · 2024-01-28T13:33:43Z

Did you try to run opt on this file?

tobiasgrosser · 2024-01-28T13:34:45Z

Who created this file? Did you export it into a .ll file and see if it validates?

Kha · 2024-01-29T09:01:03Z

src/Lean/Compiler/IR/EmitLLVM.lean

  LLVM.i64Type llvmctx

+-- TODO(bollu): instantiate target triple and find out what unsigned is.


We can probably replace this with a static assert that unsigned is two bytes because we're definitely not expecting it to ever be something else :)

two bytes?

Uh, that's the minimum size, but we'd rather expect four, yes

Relatedly: Can we please remove the usage of wobbly types from lean.h? I see no advantage to using unsigned over uint32_t and a risk of miscompilations if they ever differ.

Strong 👍 on that one from my side.

Strong +1 from me as well, this will eliminate all of the wobbliness from C in the backend code.

I suggest performing the refactor (either changing it to 4 bytes, or the more aggressive refactor suggested by @digama0) as a subsequent PR.

tobiasgrosser · 2024-01-29T09:22:39Z

I am happy to go ahead with this. I suggested further improvements, but those can be submitted in subsequent PRs as they are not directly related to this PR.

tobiasgrosser · 2024-02-13T08:36:30Z

Can we merge this PR? We are currently polishing the LLVM backend further, and it would help to not have to stack PRs.

Kha · 2024-02-13T10:57:50Z

Sorry, didn't have it on my radar anymore!

@bollu

…#3244) Again co-developed with @bollu. Based on top of: #3225 While hunting down the performance discrepancy on qsort.lean between C and LLVM we noticed there was a single, trivially optimizeable, alloca (LLVM's stack memory allocation instruction) that had load/stores in the hot code path. We then found: https://groups.google.com/g/llvm-dev/c/e90HiFcFF7Y. TLDR: `mem2reg`, the pass responsible for getting rid of allocas if possible, only triggers on an alloca if it is in the first BB. The allocas of the current implementation get put right at the location where they are needed -> they are ignored by mem2reg. Thus we decided to add functionality that allows us to push all allocas up into the first BB. We initially wanted to write `buildPrologueAlloca` in a `withReader` style so: 1. get the current position of the builder 2. jump to first BB and do the thing 3. revert position to the original However the LLVM C API does not expose an option to obtain the current position of an IR builder. Thus we ended up at the current implementation which resets the builder position to the end of the BB that the function was called from. This is valid because we never operate anywhere but the end of the current BB in the LLVM emitter. The numbers on the qsort benchmark got improved by the change as expected, however we are not fully there yet: ``` C: Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.005 s ± 0.013 s [User: 1.996 s, System: 0.003 s] Range (min … max): 1.993 s … 2.036 s 10 runs LLVM before aligning the types Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.151 s ± 0.007 s [User: 2.146 s, System: 0.001 s] Range (min … max): 2.142 s … 2.161 s 10 runs LLVM after aligning the types Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.073 s ± 0.011 s [User: 2.067 s, System: 0.002 s] Range (min … max): 2.060 s … 2.097 s 10 runs LLVM after this Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.038 s ± 0.009 s [User: 2.032 s, System: 0.001 s] Range (min … max): 2.027 s … 2.052 s 10 runs ``` Note: If you wish to merge this PR independently from its predecessor, there is no technical dependency between the two, I'm merely stacking them so we can see the performance impacts of each more clearly.

hargoniX requested review from leodemoura and Kha as code owners January 27, 2024 23:22

hargoniX changed the title ~~Fixing type mismatches in the LLVM backend~~ fix: type mismatches in the LLVM backend Jan 27, 2024

feat: LLVM verify the module before writing to bc

7ed5da0

hargoniX force-pushed the llvm-uint32 branch from 3d306e0 to 43cb9aa Compare January 27, 2024 23:33

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jan 27, 2024

hargoniX force-pushed the llvm-uint32 branch from 43cb9aa to 5820634 Compare January 28, 2024 16:30

hargoniX added 2 commits January 28, 2024 17:41

fix: type discrepancies between LLVM and lean.h

bc43256

style: Use the same integer constant API in LLVM

1a17cd1

hargoniX force-pushed the llvm-uint32 branch from 5820634 to 1a17cd1 Compare January 28, 2024 16:41

Kha reviewed Jan 29, 2024

View reviewed changes

Kha approved these changes Jan 29, 2024

View reviewed changes

hargoniX mentioned this pull request Feb 2, 2024

perf: LLVM backend, put all allocas in the first BB to enable mem2reg #3244

Merged

Kha added this pull request to the merge queue Feb 13, 2024

Merged via the queue into leanprover:master with commit 06f73d6 Feb 13, 2024
10 checks passed

tobiasgrosser deleted the llvm-uint32 branch February 13, 2024 11:56

Kha mentioned this pull request Sep 20, 2024

LLVM vs C backend performance #3192

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: type mismatches in the LLVM backend #3225

fix: type mismatches in the LLVM backend #3225

hargoniX commented Jan 27, 2024 •

edited

Loading

leanprover-community-mathlib4-bot commented Jan 27, 2024

digama0 commented Jan 28, 2024

hargoniX commented Jan 28, 2024

tobiasgrosser commented Jan 28, 2024

hargoniX commented Jan 28, 2024 •

edited

Loading

tobiasgrosser commented Jan 28, 2024

tobiasgrosser commented Jan 28, 2024

Kha Jan 29, 2024

digama0 Jan 29, 2024

Kha Jan 29, 2024

digama0 Jan 29, 2024

hargoniX Jan 29, 2024

bollu Jan 29, 2024

bollu Jan 29, 2024

tobiasgrosser commented Jan 29, 2024

tobiasgrosser commented Feb 13, 2024

Kha commented Feb 13, 2024

		LLVM.i64Type llvmctx

		-- TODO(bollu): instantiate target triple and find out what unsigned is.

fix: type mismatches in the LLVM backend #3225

fix: type mismatches in the LLVM backend #3225

Conversation

hargoniX commented Jan 27, 2024 • edited Loading

leanprover-community-mathlib4-bot commented Jan 27, 2024

digama0 commented Jan 28, 2024

hargoniX commented Jan 28, 2024

tobiasgrosser commented Jan 28, 2024

hargoniX commented Jan 28, 2024 • edited Loading

tobiasgrosser commented Jan 28, 2024

tobiasgrosser commented Jan 28, 2024

Kha Jan 29, 2024

Choose a reason for hiding this comment

digama0 Jan 29, 2024

Choose a reason for hiding this comment

Kha Jan 29, 2024

Choose a reason for hiding this comment

digama0 Jan 29, 2024

Choose a reason for hiding this comment

hargoniX Jan 29, 2024

Choose a reason for hiding this comment

bollu Jan 29, 2024

Choose a reason for hiding this comment

bollu Jan 29, 2024

Choose a reason for hiding this comment

tobiasgrosser commented Jan 29, 2024

tobiasgrosser commented Feb 13, 2024

Kha commented Feb 13, 2024

hargoniX commented Jan 27, 2024 •

edited

Loading

hargoniX commented Jan 28, 2024 •

edited

Loading