Executor feature comparison #276

tomwhite · 2023-07-24T15:57:49Z

(I wrote this to help track what works needs to be done on the executors, but it might be useful to add to the user docs at some point.)

This table shows the features that the local (single-machine) executors support.

Feature	`single-threaded`	`threads`	`processes`
Callbacks	✅	✅	✅
Task concurrency	❌	✅	✅
Retries	❌	✅	❌
Timeouts	❌	❌	❌
Straggler mitigation	❌	✅	✅
Input batching	N/A ⁴	✅	✅
Resume	✅	✅	✅
Compute arrays in parallel	❌	✅	✅
Runtime memory check	❌	✅	✅

This table shows the same for the cloud executors:

Feature	`lithops`	`modal`	`beam` (Dataflow)	`dask`
Callbacks	✅	✅	✅	✅
Task concurrency	✅	✅	✅	✅
Retries	✅	✅	✅ ¹	✅ ²
Timeouts	✅	✅	?	❌ ³
Straggler mitigation	✅	✅	?	✅
Input batching	❌	✅	N/A ⁵	✅
Resume	✅	✅	❌	✅
Compute arrays in parallel	✅	✅	✅	✅
Runtime memory check	✅	✅	❌	✅
Supported clouds	AWS, GCP, and others	AWS, GCP	GCP	AWS, GCP

Executors

The single-threaded executor is a very simple executor, used for running tests, and is deliberately designed not to have anything except the most basic features.

The threads executor is also for testing, but has a few features, mostly as a way to test async features locally, without having to use Modal, which is the only other async executor.

The processes executor is for running large datasets that can fit on a single machine's disk.

The other executors are all designed for real workloads running at scale, so all of the features are desirable. Some are included in the platform, while others are implemented in Cubed. For example, both Lithops and Modal provide timeouts as a part of the platform, whereas of the two only Modal provides retries as a built-in feature (for Lithops we implement retries in Cubed). Neither platform provides anything for mitigating stragglers, so Cubed provides a backup tasks implementation for both.

Features

Task concurrency - can the executor run multiple tasks at once?

Input batching - for very large computations it's important that not all inputs for a given array are materialized at once, as that might lead to an out of memory situation on the client. The remedy for this is to submit the input in batches, or in a streaming fashion if the platform supports it. See #239

Resume - can an executor resume a computation that didn't complete? (This requires that the computation is pickled so it can be restarted.)

Compute arrays in parallel - are arrays computed one at a time, or in parallel? For small arrays the latter can take advantage of more parallelism if it is available and speed up computations.

Runtime memory check - does the executor make sure that your allowed_mem setting is no greater than what the runtime provides? #220

Footnotes

Google Cloud Dataflow has four retry attempts.
Dask added retries in 2017. See also this SO thread. There is also a Reschedule exception that serves a similar purpose.
Dask doesn't seem to have task timeouts. There's a discussion about timeouts and very slow tasks here, including how to work around very slow or hanging tasks.
One task is run at a time, which is not really batching.
For Beam, the client submits a DAG to the service, so there is no problem with running out of memory on the client for very large arrays, thus there is no need to implement input batching.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2023-07-24T19:56:34Z

This is very helpful.

For the Coiled Functions executor #260 I think everything is the same as the Dask column except that Callbacks have been implemented. Adding a runtime memory check should be straightforward too.

tomwhite · 2023-07-25T16:28:50Z

I've been looking at the Dask executor today, and I think using the distributed.Client.map API may make it a lot easier to implement the missing features in the table. (A very minor downside is that you can't use the Dask local scheduler, but we have the local Python executors for that.)

Here's a prototype AsyncDaskDistributedExecutor that does this. Since it uses asyncio, I was able to copy the Modal implementation for backups fairly easily. I think adding compute arrays in parallel, and input batching, would both be very similar to the Modal implementation too. The only missing feature would be timeouts, but I think with backups that's less important.

As far as I can tell Coiled Functions don't have an asyncio version - but perhaps the futures that it returns can be used in an asyncio context, in which case we'd be able to share a lot of code.

TomNicholas · 2023-07-25T17:03:55Z

Nice! If we have an AsyncDaskDistributedExecutor is there any reason to keep the DaskDelayedExecutor?

tomwhite · 2023-07-25T19:03:19Z

Probably not.

tomwhite · 2023-08-01T10:19:33Z

As far as I can tell Coiled Functions don't have an asyncio version - but perhaps the futures that it returns can be used in an asyncio context, in which case we'd be able to share a lot of code.

Reading the docs, it says that .submit() will return a Dask Future - so we should be able to use everything from AsyncDaskDistributedExecutor.

tomwhite · 2023-08-15T11:02:29Z

#291 added batching for Python Async, Modal, and Dask.

tomwhite added the runtime label Jul 24, 2023

This was referenced Jul 27, 2023

Dask distributed async executor #279

Merged

Remove DaskDelayedExecutor #280

Merged

tomwhite mentioned this issue Aug 14, 2023

Add support for batching inputs to avoid overwhelming the backend service #291

Merged

tomwhite mentioned this issue Sep 5, 2023

Change Coiled executor to use Dask async API #299

Open

tomwhite pinned this issue Feb 29, 2024

tomwhite mentioned this issue Jul 18, 2024

Docs on how to write a new executor #498

Open

tomwhite added the documentation Improvements or additions to documentation label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executor feature comparison #276

Executor feature comparison #276

tomwhite commented Jul 24, 2023 •

edited

Loading

TomNicholas commented Jul 24, 2023

tomwhite commented Jul 25, 2023

TomNicholas commented Jul 25, 2023

tomwhite commented Jul 25, 2023

tomwhite commented Aug 1, 2023

tomwhite commented Aug 15, 2023

Executor feature comparison #276

Executor feature comparison #276

Comments

tomwhite commented Jul 24, 2023 • edited Loading

Executors

Features

Footnotes

TomNicholas commented Jul 24, 2023

tomwhite commented Jul 25, 2023

TomNicholas commented Jul 25, 2023

tomwhite commented Jul 25, 2023

tomwhite commented Aug 1, 2023

tomwhite commented Aug 15, 2023

tomwhite commented Jul 24, 2023 •

edited

Loading