-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UB when sdiv overflows #655
Comments
Here's where we implement it for aarch64: https://github.com/lifting-bits/remill/blob/master/lib/Arch/AArch64/Semantics/BINARY.cpp#L140-L151 For x86, I think we explicitly detect signed underflow, though it adds a bunch of complexity to the semantics: Also the auto-formatting with @katrinafyi I see your llvm-translator project using Remill. What are you trying to accomplish where (not?) flagging this UB is significant? Are later optimization passes replacing the Perhaps one thing to consider is that we try do the equivalent of a freezing operation, to mirror LLVM's |
Thanks for the reply, I'm not aware of any major problems this causes. We're just working towards validating lifters using the ARM specifications as a trustworthy source of truth. So there's no concrete application asides from the goal of being as accurate as possible. In the ARM ISA, it is specified implicitly. Here, the integer type is an arbitrary precision two's complement integer. |
Ah interesting! @regehr had previously done a bit of translation validation with remill (through anvill) and alive2 with opt-fuzz. I believe they eventually switched to their own custom AArch64 binary lifter/translator. @lkorenc does a bit of translation validation-type stuff internally with a separate but related system. It might be worth a longer discussion on the |
hi this is interesting! as Peter says, we have a lifter from AArch64 to LLVM, but we're not trying to find bugs in the lifter, but rather using the lifter + Alive2 to find bugs in LLVM, you can see our current list here, these are all silent miscompiles: https://github.com/regehr/alive2/blob/arm-tv/BugList.md#miscompilation-bugs-found-in-arm64-backend I don't think we've done anything particularly interesting while lifting sdiv, but we definitely had to be careful when lifting things like shift exponents, and that Remill had to be similarly careful. the thing I'm currently a bit stuck on with this project is the assembly-level memory model is sufficiently different from the LLVM memory model that we're going to need to rip out the Alive2 memory model and make it pluggable, so that it can use one set of rules for LLVM IR and another for lifted ARM code, which of course freely mixes integers and pointers in ways that make it pretty difficult to reason about. @katrinafyi that's interesting to hear that you're using the ARM semantics. we're currently not, but I've spent a bunch of time getting executable code out of Sail and Isla (tools from Peter Sewell's group that make it easier to work with ARM's semantics). currently that's a bit on the back burner since my postdoc who was supporting this project had to leave his position, but I want to keep working on it. |
Very cool! Our project is to derive a trustworthy lifter for ARMv8, based on Arm's machine-readable specifications. This is coming along nicely and it lifts a large subset of AArch64 to LLVM IR. We also use Alive2 for bidirectional translation validation to compare its results to other lifters, which is how we found these edge cases. Overall, we found Remill's semantics were actually very accurate; this and #656 being the only discrepancies of note. @regehr That's an interesting project and an impressive list of compilation bugs. This is definitely a case where you'd need to trust the lifter you're using! We had looked at Sail for our work too but had trouble along the way; its definitions can be cumbersome and its effect type is difficult to translate and reason about. Working with the ARM semantics directly is actually not too bad. The language is relatively simple and Alastair Reid has published tools for parsing and manipulating the ASL specification language, which we make use of in our lifter. |
The semantics provided for the SDIV instruction trigger undefined behaviour on INT_MIN / -1.
For example, 010cc19a sdiv x1, x0, x1 gives us something like:
Since this overflows a 64-bit int, LLVM semantics specify it is UB.
However, the ARM reference does not raise an exception in such a case; it simply evaluates to INT_MIN.
The text was updated successfully, but these errors were encountered: