-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: type mismatches in the LLVM backend #3225
Conversation
Mathlib CI status (docs):
|
Does LLVM have a typechecker, and could we use it to validate the output in the tests? |
The strongest thing that I know of is the mentioned validator that we now run per default on the .bc file to avoid producing a corrupted file. Maybe @bollu or @tobiasgrosser know of some additional tool that we could run? |
@hargoniX, how did your flow look like before you started to run the validator? LLVM always calls the validator when loading a As allowing for invalid IR at an intermediate stage makes writing transformations easier and running the validator after each pass has a performance cost, LLVM does not run the validator after each pass. However, when debugging its really useful to run the validator. |
This statement confuses me a little bit. Once we noticed the type mismatches we changed the types at the declaration of the function types but we forgot to also change the types of the numeric literals that are passed to the functions. This ended up producing things like the above:
When trying to run the optimization step on the produced |
Did you try to run |
Who created this file? Did you export it into a .ll file and see if it validates? |
LLVM.i64Type llvmctx | ||
|
||
-- TODO(bollu): instantiate target triple and find out what unsigned is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably replace this with a static assert that unsigned
is two bytes because we're definitely not expecting it to ever be something else :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, that's the minimum size, but we'd rather expect four, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly: Can we please remove the usage of wobbly types from lean.h? I see no advantage to using unsigned
over uint32_t
and a risk of miscompilations if they ever differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong 👍 on that one from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 from me as well, this will eliminate all of the wobbliness from C in the backend code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest performing the refactor (either changing it to 4 bytes, or the more aggressive refactor suggested by @digama0) as a subsequent PR.
I am happy to go ahead with this. I suggested further improvements, but those can be submitted in subsequent PRs as they are not directly related to this PR. |
Can we merge this PR? We are currently polishing the LLVM backend further, and it would help to not have to stack PRs. |
Sorry, didn't have it on my radar anymore! |
…#3244) Again co-developed with @bollu. Based on top of: #3225 While hunting down the performance discrepancy on qsort.lean between C and LLVM we noticed there was a single, trivially optimizeable, alloca (LLVM's stack memory allocation instruction) that had load/stores in the hot code path. We then found: https://groups.google.com/g/llvm-dev/c/e90HiFcFF7Y. TLDR: `mem2reg`, the pass responsible for getting rid of allocas if possible, only triggers on an alloca if it is in the first BB. The allocas of the current implementation get put right at the location where they are needed -> they are ignored by mem2reg. Thus we decided to add functionality that allows us to push all allocas up into the first BB. We initially wanted to write `buildPrologueAlloca` in a `withReader` style so: 1. get the current position of the builder 2. jump to first BB and do the thing 3. revert position to the original However the LLVM C API does not expose an option to obtain the current position of an IR builder. Thus we ended up at the current implementation which resets the builder position to the end of the BB that the function was called from. This is valid because we never operate anywhere but the end of the current BB in the LLVM emitter. The numbers on the qsort benchmark got improved by the change as expected, however we are not fully there yet: ``` C: Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.005 s ± 0.013 s [User: 1.996 s, System: 0.003 s] Range (min … max): 1.993 s … 2.036 s 10 runs LLVM before aligning the types Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.151 s ± 0.007 s [User: 2.146 s, System: 0.001 s] Range (min … max): 2.142 s … 2.161 s 10 runs LLVM after aligning the types Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.073 s ± 0.011 s [User: 2.067 s, System: 0.002 s] Range (min … max): 2.060 s … 2.097 s 10 runs LLVM after this Benchmark 1: ./qsort.lean.out 400 Time (mean ± σ): 2.038 s ± 0.009 s [User: 2.032 s, System: 0.001 s] Range (min … max): 2.027 s … 2.052 s 10 runs ``` Note: If you wish to merge this PR independently from its predecessor, there is no technical dependency between the two, I'm merely stacking them so we can see the performance impacts of each more clearly.
Debugged and authored in collaboration with @bollu.
This PR fixes several performance regressions of the LLVM backend compared to the C backend
as described in #3192. We are now at the point where some benchmarks from
tests/bench
achieve consistently equal and sometimes ever so slightly better performance when using LLVM instead of C. However there are still a few testcases where we are lacking behind ever so slightly.The PR contains two changes:
lean.h
runtime functions in the LLVM backend as inlean.h
it turns out that:a) LLVM does not throw an error if we declare a function with a different type than it actually has. This happened on multiple occasions here, in particular when the function used
unsigned
, as it was wrongfully assumed to besize_t
sized.b) Refuses to inline a function to the call site if such a type mismatch occurs. This means that we did not inline important functionality such as
lean_ctor_set
and were thus slowed down compared to the C backend which did this correctly.leanc
such as:Presumably because the generate .bc file is invalid in the first place. Thus we added a call to
LLVMVerifyModule
before serializing the module into a bitcode file. This ended producing the expected type errors from LLVM an aborting the bitcode file generation as expected.We manually checked each function in
lean.h
that is mentioned inEmitLLVM.lean
to make sure that all of their types align correctly now.Quick overview of the fast benchmarks as measured on my machine, 2 runs of LLVM and 2 runs of C to get a feeling for how far the averages move:
Leaving out benchmarks related to the compiler itself as I was too lazy to keep recompiling it from scratch until we are on a level with C.
Summing things up, it appears that LLVM has now caught up or surpassed the C backend in the microbenchmarks for the most part. Next steps from our side are: