-
Notifications
You must be signed in to change notification settings - Fork 45
Trust quorum: reconfiguration and commit behavior #8052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,9 +4,13 @@ | |
|
||
//! Various cryptographic constructs used by trust quroum. | ||
|
||
use crate::{Epoch, Threshold}; | ||
use bootstore::trust_quorum::RackSecret as LrtqRackSecret; | ||
use chacha20poly1305::{ChaCha20Poly1305, Key, KeyInit, aead, aead::Aead}; | ||
use derive_more::From; | ||
use gfss::shamir::{self, CombineError, SecretShares, Share, SplitError}; | ||
use hkdf::Hkdf; | ||
use omicron_uuid_kinds::{GenericUuid, RackUuid}; | ||
use rand::RngCore; | ||
use rand::rngs::OsRng; | ||
use secrecy::{DebugSecret, ExposeSecret, Secret}; | ||
|
@@ -15,9 +19,7 @@ use sha3::{Digest, Sha3_256}; | |
use slog_error_chain::SlogInlineError; | ||
use std::fmt::Debug; | ||
use subtle::ConstantTimeEq; | ||
use zeroize::{Zeroize, ZeroizeOnDrop}; | ||
|
||
use crate::Threshold; | ||
use zeroize::{Zeroize, ZeroizeOnDrop, Zeroizing}; | ||
|
||
/// Each share contains a byte for the y-coordinate of 32 points on 32 different | ||
/// polynomials over Ed25519. All points share an x-coordinate, which is the 0th | ||
|
@@ -203,6 +205,15 @@ impl RackSecret { | |
let secret = shamir::compute_secret(shares)?.try_into()?; | ||
Ok(secret) | ||
} | ||
|
||
pub fn reconstruct_from_iter<'a>( | ||
shares: impl Iterator<Item = &'a Share>, | ||
) -> Result<ReconstructedRackSecret, RackSecretReconstructError> { | ||
let mut shares: Vec<Share> = shares.cloned().collect(); | ||
let res = RackSecret::reconstruct(&shares); | ||
shares.zeroize(); | ||
res | ||
} | ||
} | ||
|
||
impl DebugSecret for RackSecret {} | ||
|
@@ -242,6 +253,63 @@ impl Default for Salt { | |
} | ||
} | ||
|
||
/// Encrypt the old rack secret with a key derived from the new rack secret. | ||
/// | ||
/// A random salt is generated and returned along with the encrypted secret. Key | ||
/// derivation context includes `rack_id`, `old_epoch`, and `new_epoch`. | ||
pub fn encrypt_old_rack_secret( | ||
old_rack_secret: ReconstructedRackSecret, | ||
new_rack_secret: ReconstructedRackSecret, | ||
rack_id: RackUuid, | ||
old_epoch: Epoch, | ||
new_epoch: Epoch, | ||
) -> aead::Result<(EncryptedRackSecret, Salt)> { | ||
let salt = Salt::new(); | ||
let cipher = derive_encryption_key_for_rack_secret( | ||
new_rack_secret, | ||
salt, | ||
rack_id, | ||
old_epoch, | ||
new_epoch, | ||
); | ||
|
||
// This key is only used to encrypt one plaintext. A nonce of all zeroes is | ||
// all that's required. | ||
Comment on lines
+276
to
+277
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this the case? I'm curious but not sure how the second statement follows from the first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nonces are used to ensure that an identical input message maps to a different output message when encrypted with the same key. The purpose of doing this is twofold:
In this case, we care about neither of these scenarios. There is no chance of a replay attack, as the nonce is used locally to encrypt data and not sent over the network. Everything sent over the network is already encrypted with TLS. In the second case (where the comment is relevant), we never encrypt more than one piece of data with the same key. There's no chance for cryptanalysis since we encrypt all the data in one shot, and never reuse the key. Furthermore, this data must be able to be deterministically encrypted and decrypted each time the rack secret is reconstructed and the key derived. The easiest thing to do then is to just use a constant of zero for the nonce, so we don't need to keep track via some advanced protocol. Changing the encrypted value each time we do this doesn't buy us anything and it adds significant complexity and risk. |
||
let nonce = [0u8; 12].into(); | ||
let encrypted_rack_secret = EncryptedRackSecret( | ||
cipher.encrypt(&nonce, old_rack_secret.expose_secret().as_ref())?, | ||
); | ||
|
||
Ok((encrypted_rack_secret, salt)) | ||
} | ||
|
||
fn derive_encryption_key_for_rack_secret( | ||
new_rack_secret: ReconstructedRackSecret, | ||
salt: Salt, | ||
rack_id: RackUuid, | ||
old_epoch: Epoch, | ||
new_epoch: Epoch, | ||
) -> ChaCha20Poly1305 { | ||
let prk = Hkdf::<Sha3_256>::new( | ||
Some(&salt.0[..]), | ||
new_rack_secret.expose_secret(), | ||
); | ||
|
||
// The "info" string is context to bind the key to its purpose | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "bind" mean here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each key should have a single purpose. This ties the key to that "purpose" via a unique string and a set of unique parameters for this specific encryption task. It prevents accidentally deriving the same key given the same input key material, but for which the key is intended for different purposes. None of this is strictly necessary here, but is defense in depth and a best practice. This is primarily protection against a kind of confused deputy attack, described for a somewhat similar case by soatok. I'm going to conjure up a very hypothetical example that I read something similar about a while ago, but cannot find the reference to. We can describe a hypothetical service that maintains shared input key material that it uses across accounts in order to derive encryption keys. These keys are used to decrypt on demand ciphertext stored in user accounts. Let's assume there's also a vulnerability that allows an attacker to upload encrypted data to an account they have access to so that the service will decrypt it on demand. Then, let's assume an attacker has a legitimate account for this service, as they are a member of an organization, and input key material is based on a shared org secret uploaded to the service at sign up time. Let's see what happens if keys are derived without binding the derived key to context. The attacker is in the office, and has access to encrypted backups of data stored in the a particularly sensitive company account. They do not have access to this information in the course of their duties. Say this is sensitive insider information that can be traded on and in the CFO's account. The attacker is able to inject this encrypted copy of data inside their own legitimate account via the vulnerability. Since the same encryption key is shared across users for this organization, the attacker can then ask to download or view the decrypted version even though they only have access to the encrypted data and a mechanism for uploading it to the server. However, If there was context info used to derive the key, we could set the context info to the particular user's account id. Then the encrypted data would only properly decrypt when the derived key was regenerated with the same input key material and the CFOs account id. The attacker could upload this data, but then when accessing the service they wouldn't be able to decrypt the information, because the derived key would be constructed using their own account id, and would not decrypt the data correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And because this example is very contrived, and I realize you may have been asking a simpler question, I'll provide that. By binding the info it just means that the algorithm for generating the key mixes the bits from the info string in with the input key material and other context parameters to generate the key. |
||
let mut key = Zeroizing::new([0u8; 32]); | ||
prk.expand_multi_info( | ||
&[ | ||
b"trust-quorum-v1-rack-secret", | ||
rack_id.as_untyped_uuid().as_ref(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh I guess we should probably impl There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now out in newtype-uuid 1.2.2 :) |
||
&new_epoch.0.to_be_bytes(), | ||
&old_epoch.0.to_be_bytes(), | ||
], | ||
key.as_mut(), | ||
) | ||
.unwrap(); | ||
ChaCha20Poly1305::new(Key::from_slice(key.as_ref())) | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
// This Source Code Form is subject to the terms of the Mozilla Public | ||
// License, v. 2.0. If a copy of the MPL was not distributed with this | ||
// file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
|
||
//! Various errors for the trust quorum APIs | ||
|
||
use crate::configuration::ConfigurationError; | ||
use crate::{Epoch, PlatformId, Threshold}; | ||
use omicron_uuid_kinds::RackUuid; | ||
|
||
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)] | ||
pub enum CommitError { | ||
#[error("invalid rack id")] | ||
InvalidRackId( | ||
#[from] | ||
#[source] | ||
MismatchedRackIdError, | ||
), | ||
|
||
#[error("missing prepare msg")] | ||
MissingPrepare, | ||
|
||
#[error("prepare for a later configuration exists")] | ||
OutOfOrderCommit, | ||
Comment on lines
+23
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can/should this include more information on the epoch numbers? |
||
} | ||
|
||
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)] | ||
#[error( | ||
"sled was decommissioned on msg from {from:?} at epoch {epoch:?}: last prepared epoch = {last_prepared_epoch:?}" | ||
)] | ||
pub struct SledDecommissionedError { | ||
pub from: PlatformId, | ||
pub epoch: Epoch, | ||
pub last_prepared_epoch: Option<Epoch>, | ||
} | ||
|
||
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)] | ||
#[error("mismatched rack id: expected {expected:?}, got {got:?}")] | ||
pub struct MismatchedRackIdError { | ||
pub expected: RackUuid, | ||
pub got: RackUuid, | ||
} | ||
|
||
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)] | ||
pub enum ReconfigurationError { | ||
#[error("reconfiguration coordinator must be a member of the new group")] | ||
CoordinatorMustBeAMemberOfNewGroup, | ||
|
||
#[error("upgrade from LRTQ required")] | ||
UpgradeFromLrtqRequired, | ||
|
||
#[error( | ||
"number of members: {num_members:?} must be greater than threshold: {threshold:?}" | ||
)] | ||
ThresholdMismatch { num_members: usize, threshold: Threshold }, | ||
|
||
#[error( | ||
"invalid membership size: {0:?}: must be between 3 and 32 inclusive" | ||
)] | ||
InvalidMembershipSize(usize), | ||
|
||
#[error( | ||
"invalid threshold: {0:?}: threshold must be between 2 and 31 inclusive" | ||
)] | ||
InvalidThreshold(Threshold), | ||
|
||
#[error( | ||
"Node has last committed epoch of {node_epoch:?}, message contains {msg_epoch:?}" | ||
)] | ||
LastCommittedEpochMismatch { | ||
node_epoch: Option<Epoch>, | ||
msg_epoch: Option<Epoch>, | ||
}, | ||
|
||
#[error( | ||
"sled has already prepared a request at epoch {existing:?}, and cannot prepare another at a smaller or equivalent epoch {new:?}" | ||
)] | ||
PreparedEpochMismatch { existing: Epoch, new: Epoch }, | ||
|
||
#[error("invalid rack id in reconfigure msg")] | ||
InvalidRackId( | ||
#[from] | ||
#[source] | ||
MismatchedRackIdError, | ||
), | ||
|
||
#[error("cannot reconfigure a decommissioned sled")] | ||
DecommissionedSled( | ||
#[from] | ||
#[source] | ||
SledDecommissionedError, | ||
), | ||
#[error( | ||
"reconfiguration in progress at epoch {current_epoch:?}: cannot reconfigure for older epoch {msg_epoch:?}" | ||
)] | ||
ReconfigurationInProgress { current_epoch: Epoch, msg_epoch: Epoch }, | ||
|
||
#[error("mismatched reconfiguration requests for epoch {0:?}")] | ||
MismatchedReconfigurationForSameEpoch(Epoch), | ||
|
||
#[error(transparent)] | ||
Configuration(#[from] ConfigurationError), | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ use serde::{Deserialize, Serialize}; | |
mod configuration; | ||
mod coordinator_state; | ||
pub(crate) mod crypto; | ||
pub(crate) mod errors; | ||
mod messages; | ||
mod node; | ||
mod persistent_state; | ||
|
@@ -40,6 +41,13 @@ pub use persistent_state::{PersistentState, PersistentStateSummary}; | |
)] | ||
pub struct Epoch(pub u64); | ||
|
||
impl Epoch { | ||
// Increment the epoch and return the new value | ||
pub fn inc(&self) -> Epoch { | ||
Epoch(self.0 + 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. worth using |
||
} | ||
} | ||
|
||
/// The number of shares required to reconstruct the rack secret | ||
/// | ||
/// Typically referred to as `k` in the docs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this looks different from
reconstruct
above?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for API ergonomics.