-
Notifications
You must be signed in to change notification settings - Fork 45
[nexus] add tokio-dtrace probes #8364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've written [a crate][1] that adds USDT probes for Tokio runtime events, using the unstable callbacks provided by the Tokio runtime. This commit adds that crate to Nexus. Note that, because `tokio-dtrace` releis on Tokio's unstable features, it was necessary to add config to `.cargo/config.toml` to enable the `--cfg tokio_unstable` rustc config when compiling for illumos. In addition, adding the code necessary configure the Tokio runtime to record these probes required some changes to `bin/nexus.rs`, to change the use of the `#[tokio::main]` attribute to manual use of the runtime builder. I'd like to add these DTrace probes in other binaries in Omicron as well, but figured I'd start with Nexus as a proof of concept. It might be worth some general refactoring of our `main` functions in the process of doing so. [1]: https://github.com/oxidecomputer/tokio-dtrace
Yuck, the build failure suggests that ... something ... is trying to compile |
Oh, it's because the build script that runs on that CI job is doing its own munging of the |
NOTE TO FUTURE ELIZA: you should make sure that |
run_server(&config).await.map_err(|err| CmdError::Failure(anyhow!(err))) | ||
let rt = { | ||
let mut builder = tokio::runtime::Builder::new_multi_thread(); | ||
#[cfg(target_os = "illumos")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could also be conditional on cfg(tokio_unstable)
so we don't fail to build if it's not set, but i actually want builds to fail for now to make sure it's gettig set in the places I want it to be...
let rt = { | ||
let mut builder = tokio::runtime::Builder::new_multi_thread(); | ||
#[cfg(target_os = "illumos")] | ||
tokio_dtrace::register_hooks(&mut builder) | ||
.map_err(|err| CmdError::Failure(anyhow!(err)))?; | ||
builder | ||
.enable_all() | ||
.build() | ||
.map_err(|err| CmdError::Failure(anyhow!(err)))? | ||
}; | ||
rt.block_on(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I think it would be nice to take this runtime setup code and put it in a crate, so that we can do the same setup in sled-agent
and gateway
binaries (in order to ensure we also set up the tokio-dtrace
probes there, etc). But I didn't do that in this branch, since that felt like a larger change...especially since it should probably be a new crate. There's similar code in omicron_common
's cmd
module, but I didn't want to put it there, as that crate is also a dependency of a bunch of libraries, and making it depend on tokio
would probably be bad for the parallelizability of the build graph: right now, we can go and compile a bunch of the smaller crates that depend on omicron_common
fairly quickly, while the runtime setup code would only be a dependency of the actual binary crates.
It'd be nice if we could fetch the rustflags out of .cargo/config.toml and append to it from our bash scripts instead of having to copy the rustflags in several places. Maybe worth a small xtask? Or can we convince cargo to tell us what it thinks the rustflags are? Not sure if this is worth it for this PR. |
@iliana apparently there's a |
huh, looks like there's a helios-deploy failure due to...sled-agent crashlooping? which is weird because i didn't mess with sled-agent. will investigate. |
2 [ Jun 18 17:42:41 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
3 [ Jun 18 17:42:41 Method "start" exited with status 0. ]
4 [ Jun 18 17:42:41 Rereading configuration. ]
5 ld.so.1: sled-agent: fatal: libipcc.so.1: open failed: No such file or directory
6 [ Jun 18 17:42:41 No 'refresh' method defined. Treating as :true. ]
7 [ Jun 18 17:42:41 Stopping because all processes in service exited. ]
8 [ Jun 18 17:42:41 Executing stop method (:kill). ]
9 [ Jun 18 17:42:41 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
10 [ Jun 18 17:42:41 Method "start" exited with status 0. ]
11 ld.so.1: sled-agent: fatal: libipcc.so.1: open failed: No such file or directory
12 [ Jun 18 17:42:41 Stopping because all processes in service exited. ]
13 [ Jun 18 17:42:41 Executing stop method (:kill). ]
14 [ Jun 18 17:42:41 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
15 [ Jun 18 17:42:41 Method "start" exited with status 0. ]
16 ld.so.1: sled-agent: fatal: libipcc.so.1: open failed: No such file or directory
17 [ Jun 18 17:42:41 Stopping because all processes in service exited. ]
18 [ Jun 18 17:42:41 Executing stop method (:kill). ]
19 [ Jun 18 17:42:41 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
20 [ Jun 18 17:42:41 Method "start" exited with status 0. ]
21 ld.so.1: sled-agent: fatal: libipcc.so.1: open failed: No such file or directory
22 [ Jun 18 17:42:41 Stopping because all processes in service exited. ]
23 [ Jun 18 17:42:41 Executing stop method (:kill). ]
24 [ Jun 18 17:42:41 Restarting too quickly, changing state to maintenance. ] i don't really think i did that? |
Currently, all the async Rust binaries in the Omicron workspace use the `#[tokio::main]` attribute on an `async fn main()`. This creates a Tokio runtime with the default configuration provided by that attribute, and runs the contents of the `async` main function on that runtime. We now have need to configure the Tokio runtime used by Omicron differently than the defaults provided by the `#[tokio::main]` macro. In particular, we want to do the following: - Disable the LIFO slot optimization, which can cause severe latency spikes in some cases (see #8334 (comment)) - When running on illumos, configure runtime callbacks to emit `tokio-dtrace` probes (see #8364) This requires the use of tokio's `runtime::Builder` API, rather than the `#[tokio::main]` attribute. Since there are a lot of binaries in Omicron, this branch introduces a new `omicron-runtime` crate that configures a runtime with our desired settings and runs a future on it. This can be used to replace `#[tokio::main]` fairly simply, which I've done for most binaries in the repo. There are a handful of binaries I did *not* update to use `omicron-runtime`. In particular, I didn't change the use of `#[tokio::main]` in example binaries (in particular, `oximeter-producer` and `update-engine`), as I didn't want to add an `omicron-runtime` dependency to those crates just for example. In the case of `oximeter-producer`, in particular, which is depended on by code outside of the Omicron repo, that would force those binaries to compile `omicron-runtime` unnecessarily. Similarly, I didn't make some of the dev-tools binaries which are used in xtasks use `omicron-runtime`, because I didn't want to introduce another dependency that has to be compiled before an xtask can run, and that felt more important than adding DTrace probes to those simple commands --- if something goes wrong with them, we can add the proves later. The exception to this are the `mgs-dev` and `omicron-dev` binaries, which run MGS and Nexus *in-process*; in those cases, I felt it was useful to have the DTrace probes enabled when running in development. Similarly, I also thought it was worthwhile to use `omicron-runtime` in OMDB, as it's deployed on production systems and wouild be harder to recompile with DTrace probes if we end up wanting them. Depends on #8364
Currently, all the async Rust binaries in the Omicron workspace use the `#[tokio::main]` attribute on an `async fn main()`. This creates a Tokio runtime with the default configuration provided by that attribute, and runs the contents of the `async` main function on that runtime. We now have need to configure the Tokio runtime used by Omicron differently than the defaults provided by the `#[tokio::main]` macro. In particular, we want to do the following: - Disable the LIFO slot optimization, which can cause severe latency spikes in some cases (see #8334 (comment)) - When running on illumos, configure runtime callbacks to emit `tokio-dtrace` probes (see #8364) This requires the use of tokio's `runtime::Builder` API, rather than the `#[tokio::main]` attribute. Since there are a lot of binaries in Omicron, this branch introduces a new `omicron-runtime` crate that configures a runtime with our desired settings and runs a future on it. This can be used to replace `#[tokio::main]` fairly simply, which I've done for most binaries in the repo. There are a handful of binaries I did *not* update to use `omicron-runtime`. In particular, I didn't change the use of `#[tokio::main]` in example binaries (in particular, `oximeter-producer` and `update-engine`), as I didn't want to add an `omicron-runtime` dependency to those crates just for example. In the case of `oximeter-producer`, in particular, which is depended on by code outside of the Omicron repo, that would force those binaries to compile `omicron-runtime` unnecessarily. Similarly, I didn't make some of the dev-tools binaries which are used in xtasks use `omicron-runtime`, because I didn't want to introduce another dependency that has to be compiled before an xtask can run, and that felt more important than adding DTrace probes to those simple commands --- if something goes wrong with them, we can add the proves later. The exception to this are the `mgs-dev` and `omicron-dev` binaries, which run MGS and Nexus *in-process*; in those cases, I felt it was useful to have the DTrace probes enabled when running in development. Similarly, I also thought it was worthwhile to use `omicron-runtime` in OMDB, as it's deployed on production systems and wouild be harder to recompile with DTrace probes if we end up wanting them. Depends on #8364
Currently, all the async Rust binaries in the Omicron workspace use the `#[tokio::main]` attribute on an `async fn main()`. This creates a Tokio runtime with the default configuration provided by that attribute, and runs the contents of the `async` main function on that runtime. We now have need to configure the Tokio runtime used by Omicron differently than the defaults provided by the `#[tokio::main]` macro. In particular, we want to do the following: - Disable the LIFO slot optimization, which can cause severe latency spikes in some cases (see #8334 (comment)) - When running on illumos, configure runtime callbacks to emit `tokio-dtrace` probes (see #8364) This requires the use of tokio's `runtime::Builder` API, rather than the `#[tokio::main]` attribute. Since there are a lot of binaries in Omicron, this branch introduces a new `omicron-runtime` crate that configures a runtime with our desired settings and runs a future on it. This can be used to replace `#[tokio::main]` fairly simply, which I've done for most binaries in the repo. There are a handful of binaries I did *not* update to use `omicron-runtime`. In particular, I didn't change the use of `#[tokio::main]` in example binaries (in particular, `oximeter-producer` and `update-engine`), as I didn't want to add an `omicron-runtime` dependency to those crates just for example. In the case of `oximeter-producer`, in particular, which is depended on by code outside of the Omicron repo, that would force those binaries to compile `omicron-runtime` unnecessarily. Similarly, I didn't make some of the dev-tools binaries which are used in xtasks use `omicron-runtime`, because I didn't want to introduce another dependency that has to be compiled before an xtask can run, and that felt more important than adding DTrace probes to those simple commands --- if something goes wrong with them, we can add the proves later. The exception to this are the `mgs-dev` and `omicron-dev` binaries, which run MGS and Nexus *in-process*; in those cases, I felt it was useful to have the DTrace probes enabled when running in development. Similarly, I also thought it was worthwhile to use `omicron-runtime` in OMDB, as it's deployed on production systems and wouild be harder to recompile with DTrace probes if we end up wanting them. Depends on #8364
…uter#8412) Currently, all the async Rust binaries in the Omicron workspace use the `#[tokio::main]` attribute on an `async fn main()`. This creates a Tokio runtime with the default configuration provided by that attribute, and runs the contents of the `async` main function on that runtime. We now have need to configure the Tokio runtime used by Omicron differently than the defaults provided by the `#[tokio::main]` macro. In particular, we want to do the following: - Disable the LIFO slot optimization, which can cause severe latency spikes in some cases (see oxidecomputer#8334 (comment)) - When running on illumos, configure runtime callbacks to emit `tokio-dtrace` probes (see oxidecomputer#8364) In order to make it easier to use these configurations in all production binaries in Omicron and other projects, I've introduced a new [`oxide-tokio-rt` crate](https://github.com/oxidecomputer/oxide-tokio-rt). This crate provides functions that configure a runtime with our desired settings and runs a future on it. This can be used to replace `#[tokio::main]` fairly simply, which I've done for most binaries in the repo. There are a handful of binaries I did *not* update to use `omicron-runtime`. In particular, I didn't change the use of `#[tokio::main]` in example binaries (in particular, `oximeter-producer` and `update-engine`), as I didn't want to add an `omicron-runtime` dependency to those crates just for example. In the case of `oximeter-producer`, in particular, which is depended on by code outside of the Omicron repo, that would force those binaries to compile `omicron-runtime` unnecessarily. Similarly, I didn't make some of the dev-tools binaries which are used in xtasks use `oxide-tokio-rt`, because I didn't want to introduce another dependency that has to be compiled before an xtask can run, and that felt more important than adding DTrace probes to those simple commands --- if something goes wrong with them, we can add the proves later. The exception to this are the `mgs-dev` and `omicron-dev` binaries, which run MGS and Nexus *in-process*; in those cases, I felt it was useful to have the DTrace probes enabled when running in development. Similarly, I also thought it was worthwhile to use `omicron-runtime` in OMDB, as it's deployed on production systems and would be harder to recompile with DTrace probes if we end up wanting them. Closes oxidecomputer#8364
I've written a crate that adds USDT probes for Tokio runtime events, using the unstable callbacks provided by the Tokio runtime. This commit adds that crate to Nexus. Note that, because
tokio-dtrace
releis on Tokio's unstable features, it was necessary to add config to.cargo/config.toml
to enable the--cfg tokio_unstable
rustc config when compiling for illumos. In addition, adding the code necessary configure the Tokio runtime to record these probes required some changes tobin/nexus.rs
, to change the use of the#[tokio::main]
attribute to manual use of the runtime builder.I'd like to add these DTrace
probes in other binaries in Omicron as well, but figured I'd start with Nexus as a proof of concept. It might be worth some general refactoring of our
main
functions in the process of doing so.