JuliaParallel · fda-tome · Sep 26, 2022 · Sep 26, 2022 · Nov 5, 2022 · Nov 10, 2022
diff --git a/.buildkite/pipeline.yml b/.buildkite/pipeline.yml
@@ -35,12 +35,22 @@ steps:
           julia_args: "--threads=1"
       - JuliaCI/julia-coverage#v1:
           codecov: true
+  - label: Julia 1.9
+    timeout_in_minutes: 60
+    <<: *test
+    plugins:
+      - JuliaCI/julia#v1:
+          version: "1.9"
+      - JuliaCI/julia-test#v1:
+          julia_args: "--threads=1"
+      - JuliaCI/julia-coverage#v1:
+          codecov: true
   - label: Julia nightly
     timeout_in_minutes: 60
     <<: *test
     plugins:
       - JuliaCI/julia#v1:
-          version: "1.9-nightly"
+          version: "1.10-nightly"
       - JuliaCI/julia-test#v1:
           julia_args: "--threads=1"
       - JuliaCI/julia-coverage#v1:
@@ -93,3 +103,16 @@ steps:
       BENCHMARK_SCALE: "5:5:50"
     artifacts:
       - benchmarks/result*
+  - label: DTables.jl stability test
+    timeout_in_minutes: 20
+    plugins:
+      - JuliaCI/julia#v1:
+          version: "1.8"
+    env:
+      JULIA_NUM_THREADS: "4"
+    agents:
+      queue: "juliaecosystem"
+      sandbox.jl: "true"
+      os: linux
+      arch: x86_64
+    command: "git clone https://github.com/JuliaParallel/DTables.jl.git ; julia -t4 -e 'using Pkg; Pkg.activate(\"DTables.jl\"); Pkg.develop(;path=\".\"); Pkg.instantiate(); Pkg.test()'"
diff --git a/Project.toml b/Project.toml
@@ -1,10 +1,11 @@
 name = "Dagger"
 uuid = "d58978e5-989f-55fb-8d15-ea34adc7bf54"
-version = "0.16.1"
+version = "0.16.3"
 
 [deps]
 Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
 ContextVariablesX = "6add18c4-b38d-439d-96f6-d6bc489c04c5"
+DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"

diff --git a/README.md b/README.md
@@ -17,10 +17,14 @@ At the core of Dagger.jl is a scheduler heavily inspired by [Dask](https://docs.
 
 ## Installation
 
-You can install Dagger by typing
+Dagger.jl can be installed using the Julia package manager. Enter the Pkg REPL mode by typing "]" in the Julia REPL and then run:
 
 ```julia
-julia> ] add Dagger
+pkg> add Dagger
+```
+Or, equivalently, via the Pkg API:
+```julia
+julia> import Pkg; Pkg.add("Dagger")
 ```
 
 ## Usage
@@ -37,6 +41,34 @@ b = Dagger.@spawn rand(a, 4)
 c = Dagger.@spawn sum(b)
 fetch(c) # some number!
 ```
+## Contributing Guide
+[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
+[![GitHub issues](https://img.shields.io/github/issues/JuliaParallel/Dagger.jl)](https://github.com/JuliaParallel/Dagger.jl/issues)
+[![GitHub contributors](https://img.shields.io/github/contributors/JuliaParallel/Dagger.jl)](https://github.com/JuliaParallel/Dagger.jl/graphs/contributors)
+
+Contributions are encouraged. 
+
+There are several ways to contribute to our project:
+
+**Reporting Bugs**: If you find a bug, please open an issue and describe the problem. Make sure to include steps to reproduce the issue and any error messages you receive regarding that issue.
+
+**Fixing Bugs**: If you'd like to fix a bug, please create a pull request with your changes. Make sure to include a description of the problem and how your changes will address it.
+
+Additional examples and documentation improvements are also very welcome.
+
+## Resources
+List of recommended Dagger.jl resources:
+- Docs [![][docs-master-img]][docs-master-url]
+- Videos
+  - [Distributed Computing with Dagger.jl](https://youtu.be/capjmjVHfMU)
+  - [Easy, Featureful Parallelism with Dagger.jl](https://youtu.be/t3S8W6A4Ago)
+  - [Easier parallel Julia workflow with Dagger.jl](https://youtu.be/VrqzOsav61w)
+  - [Dagger.jl Development and Roadmap](https://youtu.be/G0Y62ysFbDk)
+
+## Help and Discussion
+For help and discussion, we suggest asking in the following places:
+
+[Julia Discourse](https://discourse.julialang.org/c/domain/parallel/34) and on the [Julia Slack](https://julialang.org/slack/) in the `#distributed` channel.
 
 ## Acknowledgements
 

diff --git a/docs/src/checkpointing.md b/docs/src/checkpointing.md
@@ -54,17 +54,17 @@ Let's see how we'd modify the above example to use checkpointing:
 
 ```julia
 using Serialization
+
 X = compute(randn(Blocks(128,128), 1024, 1024))
-Y = [delayed(sum; options=Dagger.Sch.ThunkOptions(;
-checkpoint=(thunk,result)->begin
+Y = [delayed(sum; checkpoint=(thunk,result)->begin
     open("checkpoint-$idx.bin", "w") do io
         serialize(io, collect(result))
     end
 end, restore=(thunk)->begin
     open("checkpoint-$idx.bin", "r") do io
         Dagger.tochunk(deserialize(io))
     end
-end))(chunk) for (idx,chunk) in enumerate(X.chunks)]
+end)(chunk) for (idx,chunk) in enumerate(X.chunks)]
 inner(x...) = sqrt(sum(x))
 Z = delayed(inner)(Y...)
 z = collect(Z)

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -2,32 +2,34 @@
 
 ## Usage
 
-The main function for using Dagger is `spawn`:
+The main entrypoint to Dagger is `@spawn`:
 
-`Dagger.spawn(f, args...; options...)`
+`Dagger.@spawn [option=value]... f(args...; kwargs...)`
 
-or `@spawn` for the more convenient macro form:
+or `spawn` if it's more convenient:
 
-`Dagger.@spawn [option=value]... f(args...)`
+`Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)`
 
 When called, it creates an `EagerThunk` (also known as a "thunk" or "task")
-object representing a call to function `f` with the arguments `args`. If it is
-called with other thunks as inputs, such as in `Dagger.@spawn f(Dagger.@spawn
-g())`, then the function `f` gets passed the results of those input thunks. If
-those thunks aren't yet finished executing, then the execution of `f` waits on
-all of its input thunks to complete before executing.
+object representing a call to function `f` with the arguments `args` and
+keyword arguments `kwargs`. If it is called with other thunks as args/kwargs,
+such as in `Dagger.@spawn f(Dagger.@spawn g())`, then the function `f` gets
+passed the results of those input thunks, once they're available. If those
+thunks aren't yet finished executing, then the execution of `f` waits on all of
+its input thunks to complete before executing.
 
 The key point is that, for each argument to a thunk, if the argument is an
 `EagerThunk`, it'll be executed before this node and its result will be passed
 into the function `f`. If the argument is *not* an `EagerThunk` (instead, some
 other type of Julia object), it'll be passed as-is to the function `f`.
 
-Thunks don't accept regular keyword arguments for the function `f`. Instead,
-the `options` kwargs are passed to the scheduler to control its behavior:
+The `Options` struct in the second argument position is optional; if provided,
+it is passed to the scheduler to control its behavior. `Options` contains a
+`NamedTuple` of option key-value pairs, which can be any of:
 - Any field in `Dagger.Sch.ThunkOptions` (see [Scheduler and Thunk options](@ref))
 - `meta::Bool` -- Pass the input `Chunk` objects themselves to `f` and not the value contained in them
 
-There are also some extra kwargs that can be passed, although they're considered advanced options to be used only by developers or library authors:
+There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors:
 - `get_result::Bool` -- return the actual result to the scheduler instead of `Chunk` objects. Used when `f` explicitly constructs a Chunk or when return value is small (e.g. in case of reduce)
 - `persist::Bool` -- the result of this Thunk should not be released after it becomes unused in the DAG
 - `cache::Bool` -- cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.
@@ -133,18 +135,18 @@ via `@par` or `delayed`. The above computation can be executed with the lazy
 API by substituting `@spawn` with `@par` and `fetch` with `collect`:
 
 ```julia
-p = @par add1(4)
-q = @par add2(p)
-r = @par add1(3)
-s = @par combine(p, q, r)
+p = Dagger.@par add1(4)
+q = Dagger.@par add2(p)
+r = Dagger.@par add1(3)
+s = Dagger.@par combine(p, q, r)
 
 @assert collect(s) == 16
 ```
 
 or similarly, in block form:
 
 ```julia
-s = @par begin
+s = Dagger.@par begin
     p = add1(4)
     q = add2(p)
     r = add1(3)
@@ -159,7 +161,7 @@ operation, you can call `compute` on the thunk. This will return a `Chunk`
 object which references the result (see [Chunks](@ref) for more details):
 
 ```julia
-x = @par 1+2
+x = Dagger.@par 1+2
 cx = compute(x)
 cx::Chunk
 @assert collect(cx) == 3
@@ -198,15 +200,17 @@ While Dagger generally "just works", sometimes one needs to exert some more
 fine-grained control over how the scheduler allocates work. There are two
 parallel mechanisms to achieve this: Scheduler options (from
 `Dagger.Sch.SchedulerOptions`) and Thunk options (from
-`Dagger.Sch.ThunkOptions`). These two options structs generally contain the
-same options, with the difference being that Scheduler options operate
+`Dagger.Sch.ThunkOptions`). These two options structs contain many shared
+options, with the difference being that Scheduler options operate
 globally across an entire DAG, and Thunk options operate on a thunk-by-thunk
-basis. Scheduler options can be constructed and passed to `collect()` or
-`compute()` as the keyword argument `options` for lazy API usage:
+basis.
+
+Scheduler options can be constructed and passed to `collect()` or `compute()`
+as the keyword argument `options` for lazy API usage:
 
 ```julia
-t = @par 1+2
-opts = Dagger.Sch.ThunkOptions(;single=1) # Execute on worker 1
+t = Dagger.@par 1+2
+opts = Dagger.Sch.SchedulerOptions(;single=1) # Execute on worker 1
 
 compute(t; options=opts)
 
@@ -219,12 +223,46 @@ Thunk options can be passed to `@spawn/spawn`, `@par`, and `delayed` similarly:
 # Execute on worker 1
 
 Dagger.@spawn single=1 1+2
+Dagger.spawn(+, Dagger.Options(;single=1), 1, 2)
 
-Dagger.spawn(+, 1, 2; single=1)
-
-opts = Dagger.Sch.ThunkOptions(;single=1)
-delayed(+)(1, 2; options=opts)
+delayed(+; single=1)(1, 2)
 ```
 
+### Core vs. Worker Schedulers
+
+Dagger's scheduler is really two kinds of entities: the "core" scheduler, and
+"worker" schedulers:
+
+The core scheduler runs on worker 1, thread 1, and is the entrypoint to tasks
+which have been submitted. The core scheduler manages all task dependencies,
+notifies calls to `wait` and `fetch` of task completion, and generally performs
+initial task placement. The core scheduler has cached information about each
+worker and their processors, and uses that information (together with metrics
+about previous tasks and other aspects of the Dagger runtime) to generate a
+near-optimal just-in-time task schedule.
+
+The worker schedulers each run as a set of tasks across all workers and all
+processors, and handles data movement and task execution. Once the core
+scheduler has scheduled and launched a task, it arrives at the worker scheduler
+for handling. The worker scheduler will pass the task to a queue for the
+assigned processor, where it will wait until the processor has a sufficient
+amount of "occupancy" for the task. Once the processor is ready for the task,
+it will first fetch all arguments to the task from other workers, and then it
+will execute the task, package the result into a `Chunk`, and pass that back to
+the core scheduler.
+
+### Workload Balancing
+
+In general, Dagger's core scheduler tries to balance workloads as much as
+possible across all the available processors, but it can fail to do so
+effectively when either the cached per-processor information is outdated, or
+when the estimates about the task's behavior are inaccurate. To minimize the
+impact of this potential workload imbalance, the worker schedulers' processors
+will attempt to steal tasks from each other when they are under-occupied. Tasks
+will only be stolen if their [scope](`Scopes`) matches the processor attempting
+the steal, so tasks with wider scopes have better balancing potential.
+
+### Scheduler/Thunk Options
+
 [`Dagger.Sch.SchedulerOptions`](@ref)
 [`Dagger.Sch.ThunkOptions`](@ref)
diff --git a/docs/src/logging.md b/docs/src/logging.md
@@ -32,7 +32,7 @@ called. Let's construct one:
 
 ```julia
 ctx = Context()
-ml = TimspanLogging.MultiEventLog()
+ml = TimespanLogging.MultiEventLog()
 
 # Add the BytesAllocd consumer to the log as `:bytes`
 ml[:bytes] = Dagger.Events.BytesAllocd()

diff --git a/docs/src/processors.md b/docs/src/processors.md
@@ -76,42 +76,13 @@ processor B. This mechanism uses Julia's Serialization library to serialize and
 deserialize data, so data must be serializable for this mechanism to work
 properly.
 
-### Future: Hierarchy Generic Path Move
-
-NOTE: This used to be the default move behavior, but was removed because it
-wasn't considered helpful, and there were not any processor implementations
-that made use of it.
-
-Movement of data between any two processors is decomposable into a sequence of
-"moves" between a child and its parent, termed a "generic path move". Movement
-of data may also take "shortcuts" between nodes in the tree which are not
-directly connected if enabled by libraries or the user, which may make use of
-IPC mechanisms to transfer data more directly and efficiently (such as
-Infiniband, GPU RDMA, NVLINK, etc.). All data is considered local to some
-processor, and may only be operated on by another processor by first doing an
-explicit move operation to that processor.
-
 ## Processor Selection
 
 By default, Dagger uses the CPU to process work, typically single-threaded per
 cluster node. However, Dagger allows access to a wider range of hardware and
 software acceleration techniques, such as multithreading and GPUs. These more
 advanced (but performant) accelerators are disabled by default, but can easily
-be enabled by using Scheduler/Thunk options in the `proclist` field. If
-`nothing`, all default processors will be used. If a vector of types, only the
-processor types contained in `options.proclist` will be used to compute all or
-a given thunk. If a function, it will be called for each processor (with the
-processor as the argument) until it returns `true`.
-
-```julia
-opts = Dagger.Sch.ThunkOptions(;proclist=nothing) # default behavior
-# OR
-opts = Dagger.Sch.ThunkOptions(;proclist=[DaggerGPU.CuArrayProc]) # only execute on CuArrayProc
-# OR
-opts = Dagger.Sch.ThunkOptions(;proclist=(proc)->(proc isa Dagger.ThreadProc && proc.tid == 3)) # only run on ThreadProc with thread ID 3
-
-t = Dagger.@par options=opts sum(X) # do sum(X) on the specified processor
-```
+be enabled by using scopes (see [Scopes](@ref) for details).
 
 ## Resource Control
 
@@ -137,7 +108,7 @@ sufficient resources become available by thunks completing execution.
 The [DaggerGPU.jl](https://github.com/JuliaGPU/DaggerGPU.jl) package can be
 imported to enable GPU acceleration for NVIDIA and AMD GPUs, when available.
 The processors provided by that package are not enabled by default, but may be
-enabled via `options.proclist` as usual.
+enabled via custom scopes ([Scopes](@ref)).
 
 ### Future: Network Devices and Topology
 

diff --git a/docs/src/propagation.md b/docs/src/propagation.md
@@ -1,6 +1,6 @@
 # Option Propagation
 
-Most options passed to Dagger are passed via `delayed` or `Dagger.@spawn`
+Most options passed to Dagger are passed via `@spawn/spawn` or `delayed`
 directly. This works well when an option only needs to be set for a single
 thunk, but is cumbersome when the same option needs to be set on multiple
 thunks, or set recursively on thunks spawned within other thunks. Thankfully,