Skip to content

Update reconfigurator docs for #8287 #8395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions docs/reconfigurator-dev-guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -222,12 +222,15 @@ Here are the most important background tasks related to Reconfigurator:
|`inventory_collection`
|Fetches information about the current state of all hardware and software in the system (the whole rack)

|`blueprint_executor`
|Executes the most recently loaded blueprint

|`blueprint_loader`
|Loads the latest target blueprint from the database

|`blueprint_planner`
|Runs the planner to produce a new blueprint, using the most recently loaded inventory and target blueprint as input. If that blueprint differs from the current target, it is made the new target.

|`blueprint_executor`
|Executes the most recently loaded blueprint

|`blueprint_rendezvous`
|Updates rendezvous tables based on the most recent target blueprint

Expand All @@ -242,8 +245,6 @@ Here are the most important background tasks related to Reconfigurator:

Many other tasks work with Reconfigurator, too (e.g., region replacement and region snapshot replacement).

Notably absent from this list is anything related to planning. This has not been automated as a background task yet.

== Manual testing and developer workflow

There are a bunch of different environments that you can set up and use to test Omicron.
Expand Down
22 changes: 9 additions & 13 deletions docs/reconfigurator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -118,29 +118,25 @@ The long-term goal is to enable autonomous operation of both the **planner** and
----
The Planner

fleet policy (database state, inventory) (latest blueprint)
fleet policy (database state, inventory) target blueprint
\ | /
\ | /
+----------+ | +----------/
+----------+ | +----------+
| | |
v v v
planner background task
|
v
generate a new blueprint
|
|
v no
is the new blueprint different from the current target? ------> done
|
| yes
v
commit blueprint to database
|
|
v
make blueprint the target
|
|
v
done
Expand All @@ -156,18 +152,18 @@ The Executor
+----+ +----+
| |
v v

"executor"
(background task)
executor background task
|
v
determine actions needed
take actions
|
v
take actions
----

This planner will evaluate whether the current (target) blueprint is consistent with the current policy. If not, the task generates a new blueprint that _is_ consistent with the current policy and attempts to make that the new target. (Multiple Nexus instances could try to do this concurrently. CockroachDB's strong consistency ensures that only one can win. The other Nexus instances must go back to evaluating the winning blueprint before trying to change it again -- otherwise two Nexus instances might fight over two equivalent blueprints.)
The planner evaluates whether the current (target) blueprint is consistent with the current policy. If not, the task generates a new blueprint that _is_ consistent with the current policy and attempts to make that the new target. (Multiple Nexus instances could try to do this concurrently. CockroachDB's strong consistency ensures that only one can win. The other Nexus instances must go back to evaluating the winning blueprint before trying to change it again -- otherwise two Nexus instances might fight over two equivalent blueprints.)

The execution task will evaluate whether the state reflected in the latest inventory collection is consistent with the current target blueprint. If not, it executes operations to bring reality into line with the blueprint. This means provisioning new zones, removing old zones, adding instances to DNS, removing instances from DNS, carrying out firmware updates, etc.
The execution task evaluates whether the state reflected in the latest inventory collection is consistent with the current target blueprint. If not, it executes operations to bring reality into line with the blueprint. This means provisioning new zones, removing old zones, adding instances to DNS, removing instances from DNS, carrying out firmware updates, etc.

=== Currently: plan on-demand, execute continuously

Expand All @@ -179,7 +175,7 @@ We're being cautious about rolling out that kind of automation. Instead, today,

`omdb` uses the Nexus internal API to do these things. Since this can only be done using `omdb`, Reconfigurator can really only be used by Oxide engineering and support, not customers.

To get to the long term vision where the system is doing all this on its own in response to operator input, we'll need to get confidence that continually executing the planner will have no ill effects on working systems. This might involve more operational experience with it, more safeties, and tools for pausing execution, previewing what it _would_ do, etc.
The planner background task is currently disabled by default, but can be enabled by setting the Nexus configuration option `blueprints.disable_planner = false`. To get to the long term vision where the system is doing all this on its own in response to operator input, we'll need to get confidence that continually executing the planner will have no ill effects on working systems. This might involve more operational experience with it, more safeties, and tools for pausing execution, previewing what it _would_ do, etc.

== Design patterns

Expand Down
Loading