Use rayon as the bevy_task thread pool #11995

james7132 · 2024-02-20T09:48:43Z

Objective

This is a draft. There is zero intent to merge this until there is some form of consensus on general direction

This is one potential option for dealing with #6941 and #10064, as well as a general alternative to #11492.

This PR aims to lay the foundations for using rayon as Bevy's core task executor, as well as support the very powerful APIs built on top of it (i.e. ParallelIterator and friends).

Background

Bevy's tasks used to be based on rayon. However, in #1525, we shifted to use async_executor and our own hand-rolled thread pool. The are several reasons for that change:

rayon's support of async/await is non-existent, and still effectively is not supported in any first party way.
rayon could not run on WASM targets. This has since changed and there is tentative support for mulitthreading with web workers and SharedArrayBuffer on web platforms.

However, losing rayon has actually caused us to lose a significant amount of performance, and the powerful APIs associated with rayon. With rayon being one of the best examples of high data parallelism and ergonomics in the Rust ecosystem, this is potentially leaving a lot on the table.

Rayon has since added support for WASM, and in this PR, I am going to attempt to bridge the gap to allow us to run async tasks in the same thread pool.

Solution

This PR replaces our thread pool with rayon::ThreadPool, and hand create our own async task that schedules using rayon. We then fork futures_lite::block_on to avoid parking the thread, using rayon_core::yield_now to run rayon's tasks whenever the local task yields. This should enable us to get the performance benefits and API support from Rayon, without losing our capability to schedule async tasks onto the same threads.

Not implemented in this PR is to move all non-async tasks (i.e. blocking CPU-bound operations on the ComputeTaskPool) to queue onto rayon instead of async_executor.

Current Unknowns

If we yield this way in block_on, will Rayon be able to resume a thread that has been parked by the current implementation of block_on?
~~Do we still see the benefits of rayon's thread management and contention mitigation?~~ It's definitely there, but we may have contention issues in the actual tasks being run.
~~Can we get into the internals of rayon to try to merge these two work stealing queues together?~~ We don't have access to the internals, but we don't need that to use only one top level work stealing queue.
Does this work on WASM without too many gotchas?

hymm · 2024-02-20T17:06:39Z

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

james7132 · 2024-02-20T17:39:54Z

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

Given the slightly cursed things we're doing to support scopes with external executors, I'm not confident that would be 100% sufficient, but it definitely is a cleaner approach overall.

hymm · 2024-02-20T17:43:05Z

Given the slightly cursed things we're doing to support scopes with external executors, I'm not confident that would be 100% sufficient, but it definitely is a cleaner approach overall.

This can all go away, once we remove non send data from the world.

james7132 · 2024-02-21T05:34:53Z

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

I got this working, and the results are a bit mixed right now.

The Good

Query::par_iter-bound systems are notably faster. Here's a comparison between animate_targets. Yellow is this PR, red is main.

You can immediately see the results in the trace, as the threads are spinning up faster than with just async_executor.

You can see that the app as a whole is using almost all of the available threads, even for schedules with lots of systems with little work to do in each.

The Bad

Every schedule takes longer to run as a whole. Here's the comparison for the Render schedule:

This is either stemming from the higher contention from spinning up and turning down so many threads, the alterations to block_on make scope execution harder to terminate than it was before, or the overhead of allocation upon reallocation is affecting tasks like the multithreaded executor.

hymm · 2024-02-22T06:31:40Z

I did some of my own investigation. One thing I did notice is it seems to take longer for the time system to start running in First after the first run of the multithreaded executor. It wasn't by a ton, maybe 5us. Seems likely this is due to contention. One other note is that it seems very aggressive at starting more threads. Enough so that I'm wondering if spawn_async is misbehaving and starting too many threads. Will investigate more later.

SkiFire13 · 2024-02-22T14:44:56Z

If we yield this way in block_on, will Rayon be able to resume a thread that has been parked by the current implementation of block_on?

AFAIK no, rayon only wakes up threads that registered themselves as sleeping in the threadpool's registry, see for example Sleep::wake_specific_thread where it checks if is_slepping is true, and that's only set in Sleep::sleep, which in turn only gets called when a thread is not running any task (in the case of your block_on, it is running the block_on task, even though that internally parks the thread).

@james7132 when you tested with rayon-rs/rayon#679 (comment) 's suggestion were you still running that block_on function or did you completly switch to async-task?

james7132 · 2024-02-22T20:42:47Z

I was using the fork of block_on that is in this PR. It still uses parking. As far as I can tell, there's no way to yield to rayon that allows it to go to sleep, so our use of block_on may need to use something else. in_place_scope may work.

hymm · 2024-02-25T00:31:43Z

crates/bevy_tasks/src/task_pool.rs

@@ -611,6 +532,10 @@ pub struct Scope<'scope, 'env: 'scope, T> {
 }

 impl<'scope, 'env, T: Send + 'scope> Scope<'scope, 'env, T> {
+    pub fn spawn(&self, f: impl FnOnce() + Send + 'scope) {
+        self.thread_pool.spawn(|_| f());


This isn't quite right as the scope async executors can end while there are still tasks on the rayon scope. We need to push something into self.spawned. An easy fix here is to use spawn_async.

SkiFire13 · 2024-02-25T08:52:55Z

crates/bevy_tasks/src/task_pool.rs

    where
        T: Send + 'static,
    {
-        Task::new(self.executor.spawn(future))
+        let (runnable, task) = async_task::spawn(future, |runnable: Runnable| {
+            rayon_core::spawn(move || {


Note that rayon_core::spawn spawns the task on whatever rayon context is the current one in the thread it is called in. Most notably, the async_task closure may be called on a different thread than the one the task was spawned in (e.g. if the Future's Waker was moved into a different thread), in which case this would likely initialize a different global rayon's context, with a different thread pool. It should instead spawn on the current thread pool (with self.thread_pool.spawn(...))

Use rayon as our task pool

3ce8e89

james7132 requested a review from hymm February 20, 2024 09:48

james7132 added C-Performance A change motivated by improving speed, memory usage or compile times C-Usability A targeted quality-of-life change that makes Bevy easier to use A-Tasks Tools for parallel and async work labels Feb 20, 2024

Spawn directly onto Rayon

2f7b7d8

james7132 changed the title ~~Use rayon as our thread pool~~ Use rayon as the bevy_task thread pool Feb 21, 2024

james7132 added 2 commits February 21, 2024 13:03

Avoid extra allocation if the task doesn't require async

6d3b6b0

Merge branch 'main' into rayon-tasks

27949b6

james7132 mentioned this pull request Feb 22, 2024

Wake up threads more aggressively if there is known work to do smol-rs/async-executor#100

Open

hymm reviewed Feb 25, 2024

View reviewed changes

SkiFire13 reviewed Feb 25, 2024

View reviewed changes

MiniaczQ mentioned this pull request Feb 26, 2024

Make tokio / async-io optional task execution engines 2 #6137

Closed

This was referenced Feb 27, 2024

Replace async_channel with event_listener in bevy_tasks #11942

Closed

Parallelized transform propagation #11760

Closed

msvbg mentioned this pull request Feb 27, 2024

Add an index argument to parallel iteration helpers in bevy_tasks #12169

Merged

Merge branch 'main' into rayon-tasks

1fc7263

hymm mentioned this pull request Sep 27, 2024

Add no_std support to bevy_tasks #15464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rayon as the bevy_task thread pool #11995

Use rayon as the bevy_task thread pool #11995

james7132 commented Feb 20, 2024 •

edited

Loading

hymm commented Feb 20, 2024

james7132 commented Feb 20, 2024

hymm commented Feb 20, 2024

james7132 commented Feb 21, 2024 •

edited

Loading

hymm commented Feb 22, 2024

SkiFire13 commented Feb 22, 2024

james7132 commented Feb 22, 2024

hymm Feb 25, 2024

SkiFire13 Feb 25, 2024 •

edited

Loading

Use rayon as the bevy_task thread pool #11995

Are you sure you want to change the base?

Use rayon as the bevy_task thread pool #11995

Conversation

james7132 commented Feb 20, 2024 • edited Loading

Objective

Background

Solution

Current Unknowns

hymm commented Feb 20, 2024

james7132 commented Feb 20, 2024

hymm commented Feb 20, 2024

james7132 commented Feb 21, 2024 • edited Loading

The Good

The Bad

hymm commented Feb 22, 2024

SkiFire13 commented Feb 22, 2024

james7132 commented Feb 22, 2024

hymm Feb 25, 2024

Choose a reason for hiding this comment

SkiFire13 Feb 25, 2024 • edited Loading

Choose a reason for hiding this comment

james7132 commented Feb 20, 2024 •

edited

Loading

james7132 commented Feb 21, 2024 •

edited

Loading

SkiFire13 Feb 25, 2024 •

edited

Loading