Skip to content

[reconfigurator] Pre-checks and post_update actions for RoT bootloader update #8325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

karencfv
Copy link
Contributor

@karencfv karencfv commented Jun 12, 2025

This commit implements several checks that must happen before updating an RoT bootloader, and post-update actions.

Manual testing on a simulated Omircon:

Previous state

$ target/debug/omdb --dns-server [::1]:64764 db inventory collections show latest sp
<...>
Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2025-06-17 01:33:40.905223 UTC from http://[::1]:58369
    cabooses:
        SLOT       BOARD        NAME          VERSION GIT_COMMIT SIGN                                                             
        SpSlot0    SimSidecarSp SimSidecar    0.0.2   ffffffff   n/a                                                              
        SpSlot1    SimSidecarSp SimSidecar    0.0.1   fefefefe   n/a                                                              
        RotSlotA   SimRot       SimSidecarRot 0.0.4   eeeeeeee   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        RotSlotB   SimRot       SimSidecarRot 0.0.3   edededed   11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0     SimRotStage0 SimSidecarRot 0.0.200 ddddddddd  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0Next SimRotStage0 SimSidecarRot 0.0.200 dadadadad  11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
    RoT pages:
        SLOT         DATA_BASE64                         
        Cmpa         c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... 
        CfpaActive   c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... 
        CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... 
        CfpaScratch  c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... 
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    RoT: slot B SHA3-256: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Updating via reconfigurator-sp-updater:

$ ./target/debug/reconfigurator-sp-updater --dns-server [::1]:64764 [::]:55066 --log-level trace
<...>
〉set SimSidecar1 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236 1.0.0 rot-bootloader -a 0.0.200 -i 0.0.200
updated configuration for SimSidecar1
Jun 17 04:34:22.403 INFO begin update attempt for baseboard, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.459 DEBG client request, body: None, uri: http://[::]:64890/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, method: GET, repo_depot_url: http://[::]:64890
Jun 17 04:34:22.463 DEBG client response, result: Ok(Response { url: "http://[::]:64890/artifact/sha256/005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236", status: 200, headers: {"content-type": "application/octet-stream", "x-request-id": "a3636c28-8ce4-4454-aba1-bc129fad4eed", "content-length": "750", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), repo_depot_url: http://[::]:64890
Jun 17 04:34:22.463 DEBG loaded artifact contents, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.464 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "146467ea-2797-43ec-9c25-af9df804f1b8", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc", stage0next_error: None, stage0next_fwid: "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.465 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.466 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "847a44c2-05d9-4779-9f57-879bbc912d7c", "content-length": "179", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.466 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "ddddddddd", name: "SimSidecarRot", sign: Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf"), version: "0.0.200" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.467 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=1", status: 200, headers: {"content-type": "application/json", "x-request-id": "c8efd6e2-953f-4952-a63d-cdf5aa78ea30", "content-length": "179", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG ready to start update, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.468 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/stage0/update?firmware_slot=1&id=f388fbca-7e52-449a-8593-ed99750b430b, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update?firmware_slot=1&id=f388fbca-7e52-449a-8593-ed99750b430b", status: 204, headers: {"x-request-id": "112bca21-dff9-4a75-b021-6abedf125f5b", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 INFO update started, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.469 DEBG started update, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "0e7f3f42-5a12-4038-8137-aa9852dee198", "content-length": "107", "date": "Tue, 17 Jun 2025 04:34:22 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:22.470 DEBG got update status, status: InProgress { bytes_received: 978, id: f388fbca-7e52-449a-8593-ed99750b430b, total_bytes: 1024 }, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.471 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/update-status, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.473 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/update-status", status: 200, headers: {"content-type": "application/json", "x-request-id": "28cb0e67-1265-4314-bad4-28c8df6b3872", "content-length": "64", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.474 DEBG got update status, status: Complete { id: f388fbca-7e52-449a-8593-ed99750b430b }, mgs_addr: http://[::1]:57702, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.474 DEBG delivered artifact, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG attempting to reset device to do bootloader signature check, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "c4b0b9af-dec6-4c2d-abb4-8a78a033c80d", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.475 DEBG attempting to retrieve boot info to verify image validity, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.476 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/rot/rot-boot-info, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.477 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/rot-boot-info", status: 200, headers: {"content-type": "application/json", "x-request-id": "cdd975eb-38dc-4f3e-b21c-820f59c76ae1", "content-length": "565", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.477 DEBG attempting to set RoT bootloader active slot, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG client request, body: Some(Body), uri: http://[::1]:57702/sp/switch/1/component/stage0/active-slot?persist=true, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/active-slot?persist=true", status: 204, headers: {"x-request-id": "c7b39382-78de-4101-8d10-1bc979d6dab1", "date": "Tue, 17 Jun 2025 04:34:24 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.478 DEBG attempting to reset device to set to new RoT bootloader version, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/rot/reset, method: POST, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/rot/reset", status: 204, headers: {"x-request-id": "ae776edb-6474-457b-98f0-7fcacf01d202", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.479 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "6e75b4c5-5a51-4d8f-84be-44c84eebdc36", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.480 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.480 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "56df2fc1-3c9d-4b07-9131-8818802028d9", "content-length": "132", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG precheck result, precheck: Ok(UpdateComplete), update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.481 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1", status: 200, headers: {"content-type": "application/json", "x-request-id": "e7ef45f4-2a62-43c0-8b66-840acf0eaa42", "content-length": "734", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG found SP state, state: SpState { base_mac_address: [0, 0, 0, 0, 0, 0], hubris_archive_id: "0000000000000000", model: "FAKE_SIM_SIDECAR", power_state: A2, revision: 0, rot: V3 { active: A, pending_persistent_boot_preference: None, persistent_boot_preference: A, slot_a_error: None, slot_a_fwid: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", slot_b_error: None, slot_b_fwid: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", stage0_error: None, stage0_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", stage0next_error: None, stage0next_fwid: "01368372b4c730e54ef9efe240bea5e9d277a3708ddd7eac7115727fde52dda4", transient_boot_preference: None }, serial_number: "SimSidecar1" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client request, body: None, uri: http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0, method: GET, mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.482 DEBG client response, result: Ok(Response { url: "http://[::1]:57702/sp/switch/1/component/stage0/caboose?firmware_slot=0", status: 200, headers: {"content-type": "application/json", "x-request-id": "cb53c8c7-8756-405e-9f76-5aad5577c600", "content-length": "132", "date": "Tue, 17 Jun 2025 04:34:25 GMT"} }), mgs_backend_addr: [::1]:57702, mgs_backend_name: dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal., update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.483 DEBG found active slot caboose, caboose: SpComponentCaboose { board: "SimRotStage0", epoch: None, git_commit: "this-is-fake-data", name: "SimRotStage0", sign: Some("SimRotStage0"), version: "1.0.0" }, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0
Jun 17 04:34:25.484 INFO update attempt done, result: CompletedUpdate, elapsed_millis: 3080, update_id: f388fbca-7e52-449a-8593-ed99750b430b, part_number: FAKE_SIM_SIDECAR, serial_number: SimSidecar1, sp_type: Switch, sp_slot: 1, component: rot_bootloader, expected_stage0_version: 0.0.200, expected_stage0_next_version: Version(ArtifactVersion("0.0.200")), artifact_hash: 005ea358f1cd316df42465b1e3a0334ea22cc0c0442cf9ddf9b42fbf49780236, artifact_version: 1.0.0

State after the update

$ target/debug/omdb --dns-server [::1]:64764 db inventory collections show latest sp
<...>
Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2025-06-17 02:23:37.654819 UTC from http://[::1]:58369
    cabooses:
        SLOT       BOARD        NAME          VERSION GIT_COMMIT        SIGN                                                             
        SpSlot0    SimSidecarSp SimSidecar    0.0.2   ffffffff          n/a                                                              
        SpSlot1    SimSidecarSp SimSidecar    0.0.1   fefefefe          n/a                                                              
        RotSlotA   SimRot       SimSidecarRot 0.0.4   eeeeeeee          11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        RotSlotB   SimRot       SimSidecarRot 0.0.3   edededed          11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf 
        Stage0     SimRotStage0 SimRotStage0  1.0.0   this-is-fake-data SimRotStage0                                                     
        Stage0Next SimRotStage0 SimRotStage0  1.0.0   this-is-fake-data SimRotStage0                                                     
    RoT pages:
        SLOT         DATA_BASE64                         
        Cmpa         c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... 
        CfpaActive   c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... 
        CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... 
        CfpaScratch  c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... 
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    RoT: slot B SHA3-256: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Related: #7988

Comment on lines +631 to +647
// We'll loop for 3 minutes to wait for any ongoing RoT bootloader update.
// We need to wait for 2 resets which have a timeout of 60 seconds each,
// and an attempt to retrieve boot info, which has a time out of 30 seconds.
// We give an additional 30 seconds to as a buffer for the other actions.
Ok(PrecheckStatus::WaitingForOngoingRotBootloaderUpdate) => {
if before.elapsed()
>= WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT
{
return Err(UpdateWaitError::Timeout(
WAIT_FOR_ONGOING_ROT_BOOTLOADER_UPDATE_TIMEOUT,
));
}

tokio::time::sleep(ROT_BOOLOADER_UPDATE_PROGRESS_INTERVAL)
.await;
continue;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davepacheco is this implementation accurate with #7988 (comment) ? Or is there something I missed?

Comment on lines +64 to +72
// TODO-K: In the RoT bootloader update code in wicket, there is a set of
// known bootloader FWIDs that don't have cabooses. Is this something we
// should care about here?
// https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1817-L1841

// TODO-K: There are also older versions of the SP have a bug that prevents
// setting the active slot for the RoT bootloader. Is this something we should
// care about here?
// https://github.com/oxidecomputer/omicron/blob/89ce370f0a96165c777e90a008257a6085897f2a/wicketd/src/update_tracker.rs#L1705-L1710
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to get some input from @davepacheco or @lzrd here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with @lzrd IRL about this. We do want to keep these checks in place for development experience. But they're not urgent. These can be added later.

Comment on lines +184 to +187
// TODO-K: In post_update we'll be restarting the RoT twice to do signature
// checks, and to set stage0 to the new version. What happens if the RoT
// itself is being updated (during the reset stage)? Should we check for that
// here before setting the RoT bootloader as ready to update?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even possible? I know the planner will be doing the SP, RoT, bootloader, host OS updates sequentially. But could it be possible that a rogue nexus may attempt to do an RoT update while a bootloader one is happening or vice versa?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about a "rogue" Nexus, but I think we should assume it's always possible any given Nexus could be executing an older blueprint concurrently with a different Nexus executing a newer blueprint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, that makes sense. I agree.

I guess my question now is what happens if a Nexus is resetting an RoT as part of an RoT update, and another Nexus is resetting an RoT as part of an RoT bootloader update?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the prechecks should prevent this from happening, even if a Nexus is operating on a blueprint. Assuming of this is working as intended:

  1. The planner ensures there is at most one PendingMgsUpdate in a given blueprint
  2. The planner only removes a PendingMgsUpdate if it's completed or become impossible
  3. The prechecks of any given update prevent a Nexus from starting an update if the target isn't in the same state it was when the planner decided to perform the update

I don't think it's possible for two different Nexuses to attempt to reset two different components simultaneously:

  • "reset" happens at the end of the update
  • ... which means all the prechecks passed
  • ... which means the update couldn't have been completed yet
  • ... which means the planner couldn't have created a new blueprint with a different PendingMgsUpdate (unless the update has become impossible, which should have caused any in-flight update to fail before it got to "reset")

Maybe there's some path through here where this is possible, but if there is it seems like something we have to fix?

Copy link
Contributor Author

@karencfv karencfv Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding, but that's assuming we're talking about the same blueprint version, yes? Just a comment above you mention:

... I think we should assume it's always possible any given Nexus could be executing an older blueprint concurrently with a different Nexus executing a newer blueprint.

So, we could assume it's possible a Nexus is resetting an RoT as part of an RoT update of an older blueprint, and another Nexus is resetting an RoT as part of an RoT bootloader update of a newer blueprint.

Something like:

  1. Nexus#1 with a blueprint with a new RoT version starts an RoT update.
  2. Nexus#2 with a different blueprint with a new RoT bootloader version (and no changes to the RoT) starts an update.
  3. Both Nexus#1 and Nexus#2 enter the post-update stage at similar times, and clash resetting the RoT

Is this possible?

If so, it would probably make sense for the RoT bootloader to have pre-checks that validate the expected state of the RoT, and the RoT bootloader to have pre-checks that validate the expected state of the RoT bootloader.

Is it overkill to add those additional checks even if we're almost certain that this scenario is near impossible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding, but that's assuming we're talking about the same blueprint version, yes?

No, I meant if one Nexus was executing an older blueprint.

Something like:

  1. Nexus#1 with a blueprint with a new RoT version starts an RoT update.
  2. Nexus#2 with a different blueprint with a new RoT bootloader version (and no changes to the RoT) starts an update.
  3. Both Nexus#1 and Nexus#2 enter the post-update stage at similar times, and clash resetting the RoT

Is this possible?

It shouldn't be. Assuming the blueprint Nexus#1 is operating on is older, I think the first two points I made above:

  1. The planner ensures there is at most one PendingMgsUpdate in a given blueprint
  2. The planner only removes a PendingMgsUpdate if it's completed or become impossible

mean that the existence of the blueprint Nexus#2 is operating on implies that the RoT update in Nexus#1's older blueprint has either completed or become impossible, which means Nexus#1's prechecks should fail, and therefore it will never get to the post-update stage.

Maybe it's possible in the "abandoned update" case, but I hope not? That would be an ordering something like this:

  1. Nexus#1 starts an RoT update, performs successful prechecks, then goes out to lunch.
  2. A couple minutes later, Nexus#2 aborts Nexus#1's update, takes it over, and drives it to completion.
  3. A new blueprint is produced to update the RoT bootloader.
  4. Nexus#2 starts executing the new RoT bootloader update.
  5. Nexus#1 wakes back up; it's already passed the prechecks, so it resumes updating from whatever point it was at.

I guess there's a question here of: in step 5, would Nexus#1 realize everything has changed out from under it and fail to resume an update that's been abandoned? Maybe it depends on which step exactly it was on? (cc @davepacheco who has probably thought about this a lot more 😅)

@karencfv karencfv marked this pull request as ready for review June 17, 2025 05:49
@karencfv karencfv requested review from davepacheco and lzrd June 17, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants