-
Notifications
You must be signed in to change notification settings - Fork 44
[sled-agent] Integrate config-reconciler #8064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -34,14 +34,6 @@ enum SledAgentCommands { | |||
#[clap(subcommand)] | |||
Zones(ZoneCommands), | |||
|
|||
/// print information about zpools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you expecting that inventory will supplant this info? Or are you planning on replacing this access to the sled agent later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting that inventory would supplant this. (I think maybe it already has, in practice? I definitely only look at inventory when I'm curious about zpools; I don't think I've ever used these omdb subcommands.)
This is somewhat extracted from #8064, but can be landed independently and will make some of the followup sled-agent-config-reconciler PRs a little cleaner. We don't yet ledger `OmicronSledConfig`s to disk, so we're free to fiddle with the details of its fields without worrying about backwards compatibility. Fixes #7774.
abd7542
to
2574c5c
Compare
2574c5c
to
a057195
Compare
a057195
to
0faddda
Compare
…ig reconciler (#8188) The primary change here is replacing these inventory fields (a subset of `OmicronSledConfig`): ```rust pub omicron_zones: OmicronZonesConfig, pub omicron_physical_disks_generation: Generation, ``` with these: ```rust pub ledgered_sled_config: Option<OmicronSledConfig>, pub reconciler_status: ConfigReconcilerInventoryStatus, pub last_reconciliation: Option<ConfigReconcilerInventory>, ``` Once #8064 lands, all three of these will be filled in meaningfully; as of this PR, only `ledgered_sled_config` is populated. (`reconciler_status` is always `NotYetRun` and `last_reconciliation` is always `None`, since there is no reconciler yet.) The rest of the changes are all fallout from changing inventory: * Update `omdb` printing * Update sled-agent to report the new inventory fields * Update consumers of inventory (tests, reconfigurator planner, one Nexus RPW) - these all just look at `ledgered_sled_config` for now, but will need to be updated on #8064 once other fields are populated * Update database schema, model, and queries (the bulk of the diff). This requires dropping all preexisting collections, since there's no way to migrate from just `omicron_zones` to a full `OmicronSledConfig`. The first few schema migrations take care of this. Before merging I'll go through an upgrade on a racklette and confirm things come back up okay after the schema migration blows away all the pre-update inventory collections. (We think this is fine, but it'd be good to confirm.) But I think this is close enough that it's reviewable. Couple other minor changes that came along for the ride: * Closes #6770 (`inv_sled_omicron_zones` is gone now) * Fixes #8084 (added `image_source` columns to the inventory zone config table, so we don't lose `ImageSource::Artifact { hash }` values reported by sled-agent)
0faddda
to
8ff4ae3
Compare
I'm putting racklette testing notes for this branch plus a few followups in comments on the last of those followups (#8220). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hard PR to review, given its broad scope. It was made somewhat easier by recognizing a few patterns such as replacing calls to the storage manager with rx channels for disk and datasets.
It all appears correct to me, but again, hard to really tell. I'm sure it was tedious to implement as well :)
Regardless, looks good enough to merge and continue with.
method = GET, | ||
path = "/datasets", | ||
}] | ||
async fn datasets_get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these only used by the OMDB commands that got removed?
async fn dyn_datasets_config_list(&self) -> Result<DatasetsConfig, Error> { | ||
self.datasets_config_list().await.map_err(|err| err.into()) | ||
// TODO-cleanup This is super gross; add a better API (maybe fetch a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to clean this up in this PR?
/// Given a sled config, produce a reconciler result that sled-agent could | ||
/// have emitted if reconciliation succeeded. | ||
/// | ||
/// This method should only be used by tests and dev tools; real code should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mark this with #[test]
and maybe #[cfg(any(test, feature = "testing"))]
?
@@ -214,17 +214,18 @@ impl<'a> Planner<'a> { | |||
// The sled is not expunged. We have to see if the inventory | |||
// reflects the parent blueprint disk generation. If it does | |||
// then we mark any expunged disks decommissioned. | |||
// | |||
// TODO-correctness We inspect `last_reconciliation` here to confirm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On balance, this seems like the right choice to me. We should know the sled agent has acted before decommissioning.
This PR integrates the new
sled-agent-config-reconciler
crate withsled-agent
. It will not currently pass tests due to the reconciler not being completely implemented, but I'd like to get any feedback on this integration work itself (particularly as it pertains to the API ofsled-agent-config-reconciler
). See the description of #8063 for more context.There are a couple serious warts with this PR:
StorageManager
(because its functionality is being absorbed intosled-agent-config-reconciler
); however, the storage manager also has a rich set of test support. This PR leaves a couple sled-agent submodules using that test support (support-bundle/storage and zone-bundle). In the long run I think it'd be better to rework these (if there are no remaining production uses ofStorageManager
), but for now I think this is... okay? Feedback welcome.