Change lldb breakpoint and stepping algorithm #10026

jasonmolenda · 2025-02-13T23:34:12Z

No description provided.

xusheng added support for swbreak/hwbreak a month ago, and no special support was needed in ProcessGDBRemote when they're received because lldb already marks a thread as having hit a breakpoint when it stops at a breakpoint site. However, with changes I am working on, we need to know the real stop reason a thread stopped or the breakpoint hit will not be recognized. This is similar to how lldb processes the "watch/rwatch/awatch" keys in a thread stop packet -- we set the `reason` to `watchpoint`, and these set it to `breakpoint` so we set the stop reason correctly later in these methods. (cherry picked from commit 65a4d11)

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints. (cherry picked from commit 213c59d)

…lvm#109643) Apparently a typo is causing compile error, added by llvm#108504. (cherry picked from commit 85220a0)

lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at llvm#105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect. (cherry picked from commit 93e45a6)

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164 (cherry picked from commit b666ac3)

(cherry picked from commit fa71238)

jasonmolenda · 2025-02-13T23:34:29Z

@swift-ci test

jasonmolenda · 2025-02-14T19:20:19Z

Windows CI had one failure on TestConsecutiveBreakpoints,

======================================================================

FAIL: test_single_step_thread_specific (TestConsecutiveBreakpoints.ConsecutiveBreakpointsTestCase)
   Test that single step stops, even though the second breakpoint is not valid.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\swift-ci\jenkins\workspace\apple-llvm-project-pull-request-windows\llvm-project\lldb\packages\Python\lldbsuite\test\decorators.py", line 452, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\swift-ci\jenkins\workspace\apple-llvm-project-pull-request-windows\llvm-project\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 121, in test_single_step_thread_specific
    self.finish_test()
  File "C:\Users\swift-ci\jenkins\workspace\apple-llvm-project-pull-request-windows\llvm-project\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 42, in finish_test
    self.assertState(self.process.GetState(), lldb.eStateExited)
  File "C:\Users\swift-ci\jenkins\workspace\apple-llvm-project-pull-request-windows\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 2590, in assertState
    self.fail(self._formatMessage(msg, error))
AssertionError: stopped (5) != exited (10)
Config=x86_64-T:\5\bin\clang.exe

FAILED (failures=1)

which is the kind of test that would detect an actual regression, will try re-running once again but I might have to stare at ProcessWindowsNative and this test and see if I can't imagine what is happening.

jasonmolenda · 2025-02-14T19:20:24Z

@swift-ci test windows

jasonmolenda · 2025-02-25T00:39:19Z

@swift-ci test windows

This reverts commit 3fdd8fd.

jasonmolenda and others added 6 commits February 13, 2025 13:41

[lldb][FreeBSD] Fix a typo in NativeProcessFreeBSD::MonitorSIGTRAP() (l…

6c8a736

…lvm#109643) Apparently a typo is causing compile error, added by llvm#108504. (cherry picked from commit 85220a0)

[lldb] inserted a typeo when checking in a suggested fix

6872caf

(cherry picked from commit fa71238)

jasonmolenda requested a review from a team as a code owner February 13, 2025 23:34

add debug prints to TestConsecutiveBreakpoints

3fdd8fd

Revert "add debug prints to TestConsecutiveBreakpoints"

cdc1fa9

This reverts commit 3fdd8fd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change lldb breakpoint and stepping algorithm #10026

Change lldb breakpoint and stepping algorithm #10026

jasonmolenda commented Feb 13, 2025

jasonmolenda commented Feb 13, 2025

jasonmolenda commented Feb 14, 2025

jasonmolenda commented Feb 14, 2025

jasonmolenda commented Feb 25, 2025

Change lldb breakpoint and stepping algorithm #10026

Are you sure you want to change the base?

Change lldb breakpoint and stepping algorithm #10026

Conversation

jasonmolenda commented Feb 13, 2025

jasonmolenda commented Feb 13, 2025

jasonmolenda commented Feb 14, 2025

jasonmolenda commented Feb 14, 2025

jasonmolenda commented Feb 25, 2025