-
Notifications
You must be signed in to change notification settings - Fork 547
[wip][OTA-1545] Extend ClusterVersion for accepted risks #2360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,21 @@ type ClusterVersionStatus struct { | |
// +listType=atomic | ||
// +optional | ||
ConditionalUpdates []ConditionalUpdate `json:"conditionalUpdates,omitempty"` | ||
|
||
// conditionalUpdateRisks contains the list of risks associated with | ||
// conditionalUpdates. When performing a conditional update, all its | ||
// associated risks will be compared with the set of accepted risks | ||
// in the spec.desiredUpdate.accept field. If all risks for a conditional | ||
// update are included in the spec.desiredUpdate.accept set, the conditional | ||
// update will proceed, otherwise it is blocked. | ||
// The list of risks is built by a map indexed by the name of the risk. | ||
// +kubebuilder:validation:MaxItems=1000 | ||
// +patchMergeKey=name | ||
// +patchStrategy=merge | ||
// +listType=map | ||
// +listMapKey=name | ||
// +optional | ||
hongkailiu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ConditionalUpdateRisks []ConditionalUpdateRisk `json:"conditionalUpdateRisks,omitempty" patchStrategy:"merge" patchMergeKey:"name"` | ||
} | ||
|
||
// UpdateState is a constant representing whether an update was successfully | ||
|
@@ -255,10 +270,11 @@ type UpdateHistory struct { | |
Verified bool `json:"verified"` | ||
|
||
// acceptedRisks records risks which were accepted to initiate the update. | ||
// For example, it may menition an Upgradeable=False or missing signature | ||
// that was overriden via desiredUpdate.force, or an update that was | ||
// For example, it may mention an Upgradeable=False or missing signature | ||
// that was overridden via desiredUpdate.force, or an update that was | ||
// initiated despite not being in the availableUpdates set of recommended | ||
// update targets. | ||
// update targets, or in the conditionUpdates set and all associated risks | ||
// are specified in desiredUpdate.accept. | ||
// +optional | ||
AcceptedRisks string `json:"acceptedRisks,omitempty"` | ||
} | ||
|
@@ -725,6 +741,16 @@ type Update struct { | |
// | ||
// +optional | ||
Force bool `json:"force"` | ||
|
||
// accept allows an administrator to specify the set of the names of ConditionalUpdateRisk | ||
// those are considered acceptable. A conditional update is accepted by Cluster-Version | ||
// operator only if all of its risks are acceptable. | ||
// | ||
// +kubebuilder:validation:items:MaxLength=256 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why a maximum length of 256? Is there a particular pattern that we expect risk names to follow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here are some examples of risk names: $ cat ~/Downloads/risks.txt| sort| uniq | head -n 3
AcceleratedNetworkingRace
AMD19hFirmware
ARM64SecCompError524 and the longest one is
At the moment, there are not restrictions on the risk names from CVO's point of view. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So something like Should there be a restriction put in place so that user supplied values are rejected if they couldn't possibly map to a valid update risk? From what I can gather it seems like the pattern is alphanumeric CamelCase? so a minimal regex like |
||
// +kubebuilder:validation:MaxItems=1000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why was 1000 chosen? Do we have a history of there being up to 1000 risks for given upgrade? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory, all the risks could be accepted by the user. |
||
// +listType=set | ||
// +optional | ||
Comment on lines
+749
to
+752
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explicitly include these constraints in the GoDoc for the field as plain english sentences. Users will not be able to see the markers as part of the generated documentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The "allows" already gets at There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Being explicit is, IMO, better than being ambiguous with terminology. I'd expect something like:
|
||
Accept []string `json:"accept"` | ||
} | ||
|
||
// Release represents an OpenShift release image and associated metadata. | ||
|
@@ -780,11 +806,24 @@ type ConditionalUpdate struct { | |
// +required | ||
Release Release `json:"release"` | ||
|
||
// riskNames represents the set of the names of conditionalUpdateRisks | ||
// in the status that are exposed to the release in this conditional update. | ||
// The cluster-version operator will evaluate these risks and only | ||
// accept the update if there is at least one risk and for every risk | ||
// it is either not applied to the cluster or considered acceptable | ||
// by the cluster administrator. | ||
// +kubebuilder:validation:items:MaxLength=256 | ||
// +kubebuilder:validation:MaxItems=100 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why 100 here but 1000 elsewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other places are total risks for all conditional updates. |
||
// +listType=set | ||
// +optional | ||
hongkailiu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
RiskNames []string `json:"riskNames"` | ||
|
||
// risks represents the range of issues associated with | ||
// updating to the target release. The cluster-version | ||
// operator will evaluate all entries, and only recommend the | ||
// update if there is at least one entry and all entries | ||
// recommend the update. | ||
// DEPRECATED: the risks has been deprecated by riskNames. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this mean for a user/clients? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It suggest a user who uses If other fields of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, but can I as a user still rely on this I want to make sure that deprecating this field doesn't mean we are also breaking behaviors that users/clients might expect to be present. |
||
// +kubebuilder:validation:MinItems=1 | ||
// +patchMergeKey=name | ||
// +patchStrategy=merge | ||
|
@@ -806,6 +845,15 @@ type ConditionalUpdate struct { | |
// for not recommending a conditional update. | ||
// +k8s:deepcopy-gen=true | ||
type ConditionalUpdateRisk struct { | ||
// conditions represents the observations of the conditional update | ||
// risk's current status. Known types are: | ||
// * Applies, for whether the risk applies to the current cluster. | ||
// +kubebuilder:validation:MaxItems=2 | ||
// +listType=map | ||
// +listMapKey=type | ||
// +optional | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
|
||
// url contains information about this risk. | ||
// +kubebuilder:validation:Format=uri | ||
// +kubebuilder:validation:MinLength=1 | ||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was 1000 chosen? Do we have a record somewhere of how many UpdateRisks there are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/cincinnati-graph-data/tree/master/blocked-edges
So far we have
91
risks (I do not mean every one will appear incv.status
(CVO does some filtering).But the total number could grow as more risks are claimed out OCP bugs.
1000
is a number with the room for the future.I picked it without thinking much except the above.
What is the impact of say, putting 10 there in the rule?
If we update the object by 11 elements, would K8S block the update and throw some error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
I don't have any strong opinions here, but 1000 felt like it could be a really high number for something that I would not really expect to get to that point. I'm less familiar with this area, but having 1000 risks associated with an update seems bad and I would expect us to never get to that state.
How easy/difficult is it to get an update risk accepted and included in the set of risks for a particular release? How many are typically associated with any given update?
The main reason I'm pushing for a more restrictive number is because we can always increase this, but we can never decrease it.