Crash reports sent only after second launch, not immediately after app restart #1114

dalnoki · 2025-01-14T11:59:40Z

Description

A customer is encountering an issue with the timing of crash report submissions in a Qt-based application using Sentry Native with Breakpad as the backend. When the application crashes, they use the on_crash hook to restart the app immediately. However, crash reports are only appearing in Sentry after the second launch of the application, not after the app is restarted due to the crash. This delay in sending crash reports is problematic, as the app runs as a background service and relying on users to manually restart it would result in unacceptable delays in receiving crash reports.

When does the problem happen

During build
During run-time
When capturing a hard crash

Environment

OS: macOS Sequoia 15.2 (24C101)
Compiler: Clang (C++ arm64 bit)
CMake version and config: 3.28.1 Debug config

Steps To Reproduce

Please find a video recording of the issue in the shadow Jira ticket.. The issue is reproducible on the latest SDK version as well.

supervacuus · 2025-01-14T15:51:11Z

Hi @dalnoki. Thanks for the report!

When the application crashes, they use the on_crash hook to restart the app immediately.

No crash report will be written to disk if the user terminates/restarts the program from the on_crash hook. This means the Native SDK's transport won't have a report to send on the next start. The on_crash callback must return so the crash handler can finish processing. When the handler invokes the on_crash hook, no minidump is generated, and only a preliminary crash event exists in memory. I am surprised the user can see any crash report at all.

I recommend using macOS' service-manager launchd to restart the application whenever it crashes and decouple that mechanism from the on_crash hook, which cannot be used to terminate the application. In particular, the KeepAlive job property will restart the application whenever it terminates.

dalnoki · 2025-01-15T14:29:14Z

Hey @supervacuus the customer shared the following: "Also worth mentioning that we can reproduce the issue on Ubuntu as well. I want to highlight that disabling the reboot does not solve the issue. We have tried this at an early stage and it's did not work. And we are spawning a completely new instance of our app, so it should not interfere with Sentry SDK's work."

oa-mega · 2025-01-15T15:11:38Z

Hi @supervacuus ,

I want to highlight that I can see a ".envelope" file created in the database path after the crash. however for some reason it does not get picked up and sent on the immediate run after the crash (regardless our reboot hook enabled or disabled)
but rather the one after it.

supervacuus · 2025-01-21T10:58:44Z

I want to highlight that I can see a ".envelope" file created in the database path after the crash. however for some reason it does not get picked up and sent on the immediate run after the crash (regardless our reboot hook enabled or disabled)
but rather the one after it.

Thanks. The crash handler could finish processing if an envelope file were created. Otherwise, you'd probably only see a .dmp file in the run directory.

And we are spawning a completely new instance of our app, so it should not interfere with Sentry SDK's work.

One reason for the described behavior could be that a previous run still locks the crashed run. We introduced file locks so multiple processes could share a database path without interfering with their runs (i.e., each run directory is locked as long as a Native SDK using process keeps their file descriptor alive).

Is it possible that the crashed process is still being executed while you initialize the Native SDK in the newly started process? This would explain why it doesn't get picked up on the start "following" the crash but does get picked up on the next. Keep in mind that the file locks depend on the file descriptors being released. This means that any forked/spawned process that inherits file descriptors from main process could hold on to it longer than you'd expect.

oa-mega · 2025-01-23T04:19:32Z

Thank you for the reply!
After some investigations, we do have a process that outlives the crashed process and it is what's causing the issue.

Does it mean there's noway to reboot our application from the crashed process? Can we force Sentry SDK to process these envlopes after the initialization?

supervacuus · 2025-01-27T13:01:18Z

After some investigations, we do have a process that outlives the crashed process and it is what's causing the issue.

I am glad you found the cause.

Does it mean there's noway to reboot our application from the crashed process?

The problem is not the rebooting of your application but keeping the crashed process running while you initialize the Native SDK in another process that shares the same database path and expecting it to process the crashed run whose process is still running. That is a race condition. You can either delay the initialization of the Native SDK until the crashed process is fully terminated or accept that the crash will be sent at the next start.

Can we force Sentry SDK to process these envlopes after the initialization?

That is currently impossible and could also change the semantics of last_crash, so I am unsure if this will ever be exposed since sentry_init() acts as a transactional boundary for all previous runs and last_crash.

oa-mega · 2025-02-04T11:14:41Z

It was indeed the case, and as a work around we managed to identify the file descriptor causing the issue and change
its flags to add FD_CLOEXEC using fcntl. This worked perfectly on MacOS. However our application targets all major desktopp platforms, and the same solution did not work in Ubunt. Are there other descriptors/locks that Sentry uses in Linux ?
Any idea on what might cause this behavior?

supervacuus · 2025-02-05T12:39:39Z

Are there other descriptors/locks that Sentry uses in Linux ?

No, the entire underlying file-locking implementation is shared across all supported UNIXes. It is also the only file lock in the SDK; no other file descriptors should block the sending of persisted envelopes. Do you have any debug logs or a quick backtrace in a debugger against the blocked process that could verify that you are stuck in the same place?

Any idea on what might cause this behavior?

If there is still an active file lock on Ubuntu, without knowing anything about your application, I could only imagine:

since the FD_CLOEXEC bit only affects exec()-family functions, maybe your Ubuntu implementation uses other means to start processes (fork, clone without an exec()) that would inherit file descriptors?
if you duplicate your file descriptors in your Linux implementation for some reason, those duplicates wouldn't typically inherit the FD_CLOEXEC (dup3 with O_CLOEXEC being the exception), leaving the file lock active via the duplicate without further intervention.

dalnoki added the Platform: Native label Jan 14, 2025

github-project-automation bot added this to Mobile & Cross Platform SDK Jan 14, 2025

getsantry bot added the Waiting for: Product Owner label Jan 14, 2025

github-project-automation bot moved this to Needs Discussion in Mobile & Cross Platform SDK Jan 14, 2025

getsantry bot added this to GitHub Issues with 👀 3 Jan 14, 2025

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 14, 2025

getsantry bot removed the Waiting for: Product Owner label Jan 14, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Jan 14, 2025

getsantry bot added the Waiting for: Product Owner label Jan 15, 2025

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 15, 2025

kahest moved this from Needs Discussion to Needs Investigation in Mobile & Cross Platform SDK Jan 16, 2025

kahest removed the Waiting for: Product Owner label Jan 16, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Jan 16, 2025

getsantry bot added the Waiting for: Product Owner label Jan 23, 2025

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 23, 2025

kahest removed the Waiting for: Product Owner label Jan 23, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Jan 23, 2025

kahest moved this from Needs Investigation to Needs More Information in Mobile & Cross Platform SDK Jan 30, 2025

kahest added the Waiting for: Community label Jan 30, 2025

getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Jan 30, 2025

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Feb 4, 2025

getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 4, 2025

getsantry bot removed the Waiting for: Product Owner label Feb 5, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Feb 5, 2025

kahest added the Waiting for: Community label Feb 6, 2025

getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash reports sent only after second launch, not immediately after app restart #1114

Crash reports sent only after second launch, not immediately after app restart #1114

dalnoki commented Jan 14, 2025 •

edited

Loading

supervacuus commented Jan 14, 2025

dalnoki commented Jan 15, 2025

oa-mega commented Jan 15, 2025

supervacuus commented Jan 21, 2025

oa-mega commented Jan 23, 2025

supervacuus commented Jan 27, 2025

oa-mega commented Feb 4, 2025

supervacuus commented Feb 5, 2025

Crash reports sent only after second launch, not immediately after app restart #1114

Crash reports sent only after second launch, not immediately after app restart #1114

Comments

dalnoki commented Jan 14, 2025 • edited Loading

supervacuus commented Jan 14, 2025

dalnoki commented Jan 15, 2025

oa-mega commented Jan 15, 2025

supervacuus commented Jan 21, 2025

oa-mega commented Jan 23, 2025

supervacuus commented Jan 27, 2025

oa-mega commented Feb 4, 2025

supervacuus commented Feb 5, 2025

dalnoki commented Jan 14, 2025 •

edited

Loading