みらい 未来
Minimalist Async Evaluation Framework
for R
mirai — future in Japanese — allows you to perform computationally intensive tasks without blocking the R session.
→ Run R code in the background, with results available once ready
→ Distribute workloads across local or remote machines
→ Execute tasks on different compute resources based on requirements
→ Perform actions as soon as tasks complete via promises
install.packages("mirai")
→ mirai()
: Evaluate an R expression asynchronously in a parallel
processes.
→ daemons()
: Set and launch persistent background processes, local or
remote, on which to run mirai tasks.
library(mirai)
daemons(5)
#> [1] 5
m <- mirai({
Sys.sleep(1)
100 + 42
})
mp <- mirai_map(1:9, \(x) {
Sys.sleep(1)
x^2
})
m
#> < mirai [] >
m[]
#> [1] 142
mp
#> < mirai map [4/9] >
mp[.flat]
#> [1] 1 4 9 16 25 36 49 64 81
daemons(0)
#> [1] 0
mirai is designed from the ground up to provide a production-grade experience.
→ Modern
- Current technologies built on nanonext and NNG
- Communications layer supports IPC (Inter-Process Communication), TCP/IP and TLS
→ Efficient
- 1,000x more responsive vs. other alternatives [1]
- Ideal for low-latency applications e.g. real time inference & Shiny apps
→ Reliable
- No reliance on global options or variables for consistent behaviour
- Explicit evaluation for transparent and predictable results
→ Scalable
- Capacity for millions of tasks over thousands of connections
- Proven track record for heavy-duty workloads in the life sciences industry
→ Distributed Execution: Run tasks across networks and clusters using various deployment methods (SSH, HPC clusters using Slurm, SGE, Torque, PBS, LSF, etc.)
→ Compute Profiles: Manage different sets of daemons independently, allowing tasks with different requirements to be executed on appropriate resources.
→ Promises Integration: An event-driven implementation performs actions on returned values as soon as tasks complete, ensuring minimal latency.
→ Serialization Support: Native serialization support for reference objects such as Arrow Tables, Polars DataFrames or torch tensors.
→ Error Handling: Robust error handling and reporting, with full stack traces for debugging.
→ RNG Management: L’Ecuyer-CMRG RNG streams for reproducible random number generation in parallel execution.
mirai serves as a foundation for asynchronous and parallel computing in the R ecosystem:
Implements the first official alternative communications backend for R
— the ‘MIRAI’ parallel cluster — fulfilling a feature request by R-Core
at R Project Sprint 2023.
Powers parallel map for the purrr functional programming toolkit, a
core tidyverse package.
Promises for ‘mirai’ and ‘mirai_map’ objects are event-driven,
providing the lowest latency and highest responsiveness for
performance-critical applications.
The primary async backend for Shiny, with full ExtendedTask support,
providing the next level of responsiveness and scalability for Shiny
apps.
The built-in async evaluator behind the
@async
tag in plumber2; also
provides an async backend for Plumber.
Allows Torch tensors and complex objects such as models and optimizers
to be used seamlessly across parallel processes.
Allows queries using the Apache Arrow format to be handled seamlessly
over ADBC database connections hosted in background processes.
R Polars is a pioneer of mirai’s serialization registration mechanism,
which allows transparent use of Polars objects across parallel
processes, with no user setup required.
Targets, a make-like pipeline tool, uses crew as its default
high-performance computing backend. Crew is a distributed worker
launcher extending mirai to different computing platforms, from
traditional clusters to cloud services.
We would like to thank in particular:
Will Landau for being instrumental in shaping development of the package, from initiating the original request for persistent daemons, through to orchestrating robustness testing for the high performance computing requirements of crew and targets.
Joe Cheng for integrating the ‘promises’ method to work seamlessly within Shiny, and prototyping event-driven promises.
Luke Tierney of R Core, for discussion on L’Ecuyer-CMRG streams to ensure statistical independence in parallel processing, and making it possible for mirai to be the first ‘alternative communications backend for R’.
Travers Ching for a novel idea in extending the original custom serialization support in the package.
Hadley Wickham for original implementations of the scoped helper functions, on which ours are based.
Henrik Bengtsson for valuable insights leading to the interface accepting broader usage patterns.
Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.
Kirill Müller for discussion on using parallel processes to host Arrow database connections.
◈ mirai R package: https://mirai.r-lib.org/
◈ nanonext R
package: https://nanonext.r-lib.org/
mirai is listed in CRAN High Performance Computing Task View:
https://cran.r-project.org/view=HighPerformanceComputing
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.