bug: `bv_decide` regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

alexkeizer · 2024-10-09T22:41:58Z

Prerequisites

Please put an X between the brackets as you perform the following steps:

Check that your issue is not already filed:
https://github.com/leanprover/lean4/issues
Reduce the issue to a minimal, self-contained, reproducible test case.
Avoid dependencies to Mathlib or Batteries.
Test your test case against the latest nightly release, for example on
https://live.lean-lang.org/#project=lean-nightly
(You can also use the settings there to switch to “Lean nightly”)

Description

Consider the following MWE:

import Std.Tactic.BVDecide

example
  (a k n : BitVec 64) :
  n < 18446744073709551615#64 - k →
    ((¬a + k + 1#64 - a ≤ a + k - a ∧ ¬a + k + 1#64 + n - a ≤ a + k - a) ∧
        ¬a - (a + k + 1#64) ≤ a + k + 1#64 + n - (a + k + 1#64)) ∧
      ¬a + k - (a + k + 1#64) ≤ a + k + 1#64 + n - (a + k + 1#64) := by
  bv_decide

On nightly-2024-09-10 this proof worked. On nightly-2024-10-08 it times out instead, even if we double the time limit to 120 seconds.

Context

Example extracted from LNSym, specifically, from Arm/Memory/SeparateProofs.lean.

Steps to Reproduce

Open the MWE in nightly-2024-09-10, observe that the proof goes through
Now change the toolchain to nightly-2024-10-08, and observe that it times out

Expected behavior: I expect the proof to go through

Actual behavior: It times out

Versions

nightly-2024-10-08 is the version with the bad behaviour.

Additional Information

If we use bv_decide? to generate an LRAT in the working version, we get a roughly 12MB file.
Using bv_check with that LRAT in the new version gives a type mismatch, but more surprisingly, if we then use bv_decide? in the new Lean version, it manages to spit out 300+ MB of LRAT before it gets killed (with the 120 second timeout).

We see a similar thing happen in other examples from that same PR, which do manage to go through in the new toolchain, but where the LRAT had to be regenerated and the new file is 10x larger than the old LRAT (in particular, now exceeding GitHub's limit for filesize).

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

The text was updated successfully, but these errors were encountered:

hargoniX · 2024-10-10T09:07:50Z

The hard sub problem here is:

import Std.Tactic.BVDecide

set_option bv.ac_nf false in
set_option trace.Meta.Tactic.bv true in
set_option trace.profiler true in
example (a k n : BitVec 64) (h : n < 18446744073709551615#64 - k) :
    ¬a - (a + k + 1#64) ≤ a + k + 1#64 + n - (a + k + 1#64) := by
  bv_decide

This regression is because of the introduction of the ac_nf pass, which is rather unfortunate as it helps a lot in other situations :( You can disable it with set_option bv.ac_nf false but I hope I'll be able to modify it such that this passes.

hargoniX · 2024-10-10T15:58:36Z

I tried running this with Bitwuzla and it takes a total of 48 seconds on this problem, it appears if you have normalization up to commutativity enabled this is just really hard unless you happen to hit the form in which you have already found the problem :( So the best solution here is most likely to just accept that we have to disable ac_nf here.

alexkeizer · 2024-10-10T16:05:26Z

It's a shame we'll have to get rid of the ac_nf pass, but thanks for investigating!

This is needed to work around leanprover/lean4#5664 TL;DR: the normalization pass actively made it harder for the SAT solver to prove our goals, causing much larger LRAT files and even timeouts for some pathological goals

This takes a few standalone bitvector problems, about inequalties, from LNSym, and adds them as a benchmark to prevent further regressions with bv_decide. These problems are particularly interesting, because they've previously had a bad interaction with bv_decides normalization pass, see #5664. --------- Co-authored-by: Henrik Böving <[email protected]>

alexkeizer added the bug Something isn't working label Oct 9, 2024

alexkeizer changed the title ~~bug: bv_decide regression~~ bug: bv_decide regression; proof that worked before now times out, and LRAT file sizes are much larger Oct 9, 2024

alexkeizer closed this as completed Oct 10, 2024

alexkeizer mentioned this issue Oct 14, 2024

feat: bv_decide inequality regression tests #5714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: `bv_decide` regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

bug: `bv_decide` regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

alexkeizer commented Oct 9, 2024

hargoniX commented Oct 10, 2024 •

edited

Loading

hargoniX commented Oct 10, 2024

alexkeizer commented Oct 10, 2024

bug: bv_decide regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

bug: bv_decide regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

Comments

alexkeizer commented Oct 9, 2024

Prerequisites

Description

Context

Steps to Reproduce

Versions

Additional Information

Impact

hargoniX commented Oct 10, 2024 • edited Loading

hargoniX commented Oct 10, 2024

alexkeizer commented Oct 10, 2024

bug: `bv_decide` regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

bug: `bv_decide` regression; proof that worked before now times out, and LRAT file sizes are much larger #5664

hargoniX commented Oct 10, 2024 •

edited

Loading