Skip to content

Conversation

tlively
Copy link
Member

@tlively tlively commented Aug 29, 2025

It apparently takes more than one turn of the worker thread's event loop
to ensure that the notifications on the zombie task queues have been
cleared so the zombies can be culled. Update the test to wait two turns
of the event loop instead of one.

Also add new synchronization forcing the main thread to wait until the
worker thread has entered Wasm before proxying work to it. This prevents
the proxied work notifications from somehow being cleared before the
worker thread destroys the proxy queues, which would prevent the task
queues from ever being placed on the zombie list in the first place.

Finally, generally improve comments and make the test assertions more
specific.

Fixes #19795.

It apparently takes more than one turn of the worker thread's event loop
to ensure that the notifications on the zombie task queues have been
cleared so the zombies can be culled. Update the test to wait two turns
of the event loop instead of one.

Also add new synchronization forcing the main thread to wait until the
worker thread has entered Wasm before proxying work to it. This prevents
the proxied work notifications from somehow being cleared before the
worker thread destroys the proxy queues, which would prevent the task
queues from ever being placed on the zombie list in the first place.

Finally, generally improve comments and make the test assertions more
specific.
@tlively tlively requested review from sbc100 and juj August 29, 2025 01:29
@juj
Copy link
Collaborator

juj commented Aug 29, 2025

Very nice, thanks for the attention. Ran this several times on Windows, Linux and macOS, and it does look watertight now.

@@ -29,22 +34,28 @@ void __attribute__((noinline)) free(void* ptr) {

#endif // SANITIZER

_Atomic int worker_started = 0;
_Atomic int should_execute = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use bool for these two atomics to make the purpose more clear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tlively tlively enabled auto-merge (squash) August 29, 2025 19:08

void* execute_and_free_queue(void* arg) {
// Signal the main thread to proxy work to us.
worker_started = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I assumed you could use true/false in the all the assignment too).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tlively tlively disabled auto-merge August 29, 2025 19:29
@tlively tlively enabled auto-merge (squash) August 29, 2025 19:31
@tlively tlively merged commit ae9bfff into main Aug 29, 2025
29 of 30 checks passed
@tlively tlively deleted the proxying-zombie-flake branch August 29, 2025 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flakiness in core2.test_pthread_proxying_refcount
3 participants