Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: pidfd is ready to read, the process should have exited #7144

Open
jeromegn opened this issue Feb 7, 2025 · 6 comments
Open

Panic: pidfd is ready to read, the process should have exited #7144

jeromegn opened this issue Feb 7, 2025 · 6 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-process Module: tokio/process

Comments

@jeromegn
Copy link

jeromegn commented Feb 7, 2025

Version

TL;DR: 1.42.0

cargo tree
❯ cargo tree | grep tokio
└── tokio v1.42.0
    └── tokio-macros v2.4.0 (proc-macro)
│   │   │   └── tokio v1.42.0 (*)
│   │   └── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   │   │   ├── tokio v1.42.0 (*)
│   │   │   │   ├── tokio-util v0.7.11
│   │   │   │   │   └── tokio v1.42.0 (*)
│   │   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   │   ├── tokio v1.42.0 (*)
│   │   │   ├── tokio-util v0.7.11 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   │   └── tokio v1.42.0 (*)
│   │   └── tokio v1.42.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-stream v0.1.15
│   │   ├── tokio v1.42.0 (*)
│   │   └── tokio-util v0.7.11 (*)
│   ├── tokio-util v0.7.11 (*)
│   ├── tokio-vsock v0.5.0
│   │   ├── tokio v1.42.0 (*)
│       ├── tokio v1.42.0 (*)
│       ├── tokio-tungstenite v0.21.0
│       │   ├── tokio v1.42.0 (*)
│       ├── tokio-util v0.7.11 (*)
├── tokio v1.42.0 (*)
├── tokio-vsock v0.5.0 (*)
│   ├── tokio v1.42.0 (*)
│   └── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio-rustls v0.25.0
│   │   │   └── tokio v1.42.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-stream v0.1.15 (*)
│   ├── tokio-util v0.7.11 (*)
│   ├── tokio-vsock v0.5.0 (*)
├── tokio v1.42.0 (*)
├── tokio-stream v0.1.15 (*)
├── tokio-util v0.7.11 (*)
├── tokio-vsock v0.5.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-util v0.7.11 (*)
│   │   │       │   ├── tokio v1.42.0 (*)
│   │   │       │   └── tokio-io-timeout v1.2.0
│   │   │       │       └── tokio v1.42.0 (*)
│   │   │       ├── tokio v1.42.0 (*)
│   │   │       ├── tokio-stream v0.1.15 (*)
│   │   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio-util v0.7.11 (*)
│   │   │   └── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio-stream v0.1.15 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-fd v0.3.0
│   │   └── tokio v1.42.0 (*)
│   ├── tokio-stream v0.1.15 (*)
│   ├── tokio-tar v0.3.1
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio-stream v0.1.15 (*)
│   ├── tokio-util v0.7.11 (*)
│   ├── tokio-vsock v0.5.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   └── tokio-util v0.7.11 (*)
├── tokio v1.42.0 (*)
├── tokio-util v0.7.11 (*)

Platform

Linux version 5.15.98

Description

I'm launching crun run ... commands with tokio::process::Command and using Child::wait.

stack backtrace:
   0: rust_begin_unwind
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:662:5
   1: core::panicking::panic_fmt
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:74:14
   2: core::panicking::panic_display
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panicking.rs:264:5
   3: core::option::expect_failed
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/option.rs:2025:5
   4: core::option::Option<T>::expect
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/option.rs:928:21
   5: <tokio::process::imp::pidfd_reaper::PidfdReaperInner<W> as core::future::future::Future>::poll
             at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/process/unix/pidfd_reaper.rs:127:24
   6: <tokio::process::imp::pidfd_reaper::PidfdReaper<W,Q> as core::future::future::Future>::poll
             at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/process/unix/pidfd_reaper.rs:188:9
   7: <tokio::process::imp::Child as core::future::future::Future>::poll
             at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/process/unix/mod.rs:183:48
   8: <tokio::process::ChildDropGuard<F> as core::future::future::Future>::poll
             at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/process/mod.rs:1023:19
   9: <&mut F as core::future::future::Future>::poll
             at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:111:9
  10: tokio::process::Child::wait::{{closure}}
             at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/process/mod.rs:1223:33

# ...

I am using a FutureMap (implementation) and spawning children on it like:

self.reaper.insert(name.clone(), Box::pin(async move { child.wait().await }));

Then I'm polling it in tokio::select!.

Is it because my FutureMap is not cancel safe?

@jeromegn jeromegn added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 7, 2025
@jeromegn
Copy link
Author

jeromegn commented Feb 7, 2025

I just noticed that it only seems to happen while I am strace-ing the program after the fact (strace -fp <pid>). Very weird.

@Darksonn Darksonn added the M-process Module: tokio/process label Feb 8, 2025
@Darksonn
Copy link
Contributor

Darksonn commented Feb 8, 2025

I'm not sure, but can we eliminate your map from the equation? E.g., could you instead do

let send = send.clone();
tokio::spawn(async move { send.send((key, child.wait().await)) });

with send being a sender for an unbounded mpsc channel. The receiver can go in your select!.

@jeromegn
Copy link
Author

jeromegn commented Feb 9, 2025

@Darksonn best I could do was to use a StreamMap<String, Once<ChildWait>> since I needed to be able to remove a ChildWait in some circumstances and changing that would mean changing a lot more code.

I was still able to trigger the issue with a StreamMap (which I believe should be cancel safe?)

Only when strace-ing the process AND it doesn't happen consistently, but usually 50% of the time if not a bit more.

@jeromegn
Copy link
Author

strace significantly slows down syscalls in a process. I wonder if there's a timing issue.

@Darksonn
Copy link
Contributor

cc @ipetkov

Ideas? I guess the pidfd sometimes becomes ready when stracing?

@jeromegn
Copy link
Author

To clarify, this happens while stracing and the child exits during that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-process Module: tokio/process
Projects
None yet
Development

No branches or pull requests

2 participants