Skip to content

[wip][OTA-1545] Extend ClusterVersion for accepted risks #2360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions config/v1/types_cluster_version.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,21 @@ type ClusterVersionStatus struct {
// +listType=atomic
// +optional
ConditionalUpdates []ConditionalUpdate `json:"conditionalUpdates,omitempty"`

// conditionalUpdateRisks contains the list of risks associated with
// conditionalUpdates. When performing a conditional update, all its
// associated risks will be compared with the set of accepted risks
// in the spec.desiredUpdate.accept field. If all risks for a conditional
// update are included in the spec.desiredUpdate.accept set, the conditional
// update will proceed, otherwise it is blocked.
// The list of risks is built by a map indexed by the name of the risk.
// +kubebuilder:validation:MaxItems=1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was 1000 chosen? Do we have a record somewhere of how many UpdateRisks there are?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/cincinnati-graph-data/tree/master/blocked-edges

$ ls blocked-edges/*.yaml | while read file; do yq -r '.name'  "$file"; done | tee ~/Downloads/risks.txt

$ cat ~/Downloads/risks.txt| sort | uniq | wc -l
      91

So far we have 91 risks (I do not mean every one will appear in cv.status (CVO does some filtering).
But the total number could grow as more risks are claimed out OCP bugs.
1000 is a number with the room for the future.
I picked it without thinking much except the above.

What is the impact of say, putting 10 there in the rule?
If we update the object by 11 elements, would K8S block the update and throw some error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the impact of say, putting 10 there in the rule?
If we update the object by 11 elements, would K8S block the update and throw some error?

Yes.

So far we have 91 risks (I do not mean every one will appear in cv.status (CVO does some filtering).
But the total number could grow as more risks are claimed out OCP bugs.
1000 is a number with the room for the future.
I picked it without thinking much except the above.

I don't have any strong opinions here, but 1000 felt like it could be a really high number for something that I would not really expect to get to that point. I'm less familiar with this area, but having 1000 risks associated with an update seems bad and I would expect us to never get to that state.

How easy/difficult is it to get an update risk accepted and included in the set of risks for a particular release? How many are typically associated with any given update?

The main reason I'm pushing for a more restrictive number is because we can always increase this, but we can never decrease it.

// +patchMergeKey=name
// +patchStrategy=merge
// +listType=map
// +listMapKey=name
// +optional
ConditionalUpdateRisks []ConditionalUpdateRisk `json:"conditionalUpdateRisks,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
}

// UpdateState is a constant representing whether an update was successfully
Expand Down Expand Up @@ -255,10 +270,11 @@ type UpdateHistory struct {
Verified bool `json:"verified"`

// acceptedRisks records risks which were accepted to initiate the update.
// For example, it may menition an Upgradeable=False or missing signature
// that was overriden via desiredUpdate.force, or an update that was
// For example, it may mention an Upgradeable=False or missing signature
// that was overridden via desiredUpdate.force, or an update that was
// initiated despite not being in the availableUpdates set of recommended
// update targets.
// update targets, or in the conditionUpdates set and all associated risks
// are specified in desiredUpdate.accept.
// +optional
AcceptedRisks string `json:"acceptedRisks,omitempty"`
}
Expand Down Expand Up @@ -725,6 +741,16 @@ type Update struct {
//
// +optional
Force bool `json:"force"`

// accept allows an administrator to specify the set of the names of ConditionalUpdateRisk
// those are considered acceptable. A conditional update is accepted by Cluster-Version
// operator only if all of its risks are acceptable.
//
// +kubebuilder:validation:items:MaxLength=256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a maximum length of 256? Is there a particular pattern that we expect risk names to follow?

Copy link
Member Author

@hongkailiu hongkailiu Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up https://github.com/openshift/api/pull/2360/files/a9d2af3985180a169deaef9fea6ae0d40e807b8d#r2154495183

Here are some examples of risk names:

$ cat ~/Downloads/risks.txt| sort| uniq | head -n 3
AcceleratedNetworkingRace
AMD19hFirmware
ARM64SecCompError524

and the longest one is 55 at the moment:

$ awk 'length > max_length { max_length = length; longest_line = $0 } END { print longest_line }' ~/Downloads/risks.txt
LabeledMachineConfigAndContainerRuntimeConfigBlocksMCO

$ awk 'length > max_length { max_length = length; longest_line = $0 } END { print longest_line }' ~/Downloads/risks.txt | wc -m
      55

At the moment, there are not restrictions on the risk names from CVO's point of view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like !@&*(^*ASDF probably wouldn't be a valid value?

Should there be a restriction put in place so that user supplied values are rejected if they couldn't possibly map to a valid update risk?

From what I can gather it seems like the pattern is alphanumeric CamelCase? so a minimal regex like [A-Za-z0-9]+ is probably fairly effective to prevent clearly invalid values?

// +kubebuilder:validation:MaxItems=1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was 1000 chosen? Do we have a history of there being up to 1000 risks for given upgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up https://github.com/openshift/api/pull/2360/files/a9d2af3985180a169deaef9fea6ae0d40e807b8d#r2154495183

In theory, all the risks could be accepted by the user.
1000 is just a direct result of 1000 there.

// +listType=set
// +optional
Comment on lines +749 to +752
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly include these constraints in the GoDoc for the field as plain english sentences. Users will not be able to see the markers as part of the generated documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// accept allows an administrator to specify the set of the names of ConditionalUpdateRisk...

The "allows" already gets at +optional and "the set" already gets at +listType=set. For MaxLength and MaxItems, as Hongkai pointed out, users are expected to pass along strings they've seen in ClusterVersion status on this or other clusters, which is what names of ConditionalUpdateRisk is getting at. So maybe we're ok here? Or if you think further rewording is required, maybe you can suggest the Godocs you'd like to see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being explicit is, IMO, better than being ambiguous with terminology. I'd expect something like:

accept is an optional field for configuring the acceptable conditional update risks.
A conditional update is performed only if all of its risks are acceptable.
Entries must be unique and must not exceed 256 characters.
accept must not contain more than 1000 entries.

Accept []string `json:"accept"`
}

// Release represents an OpenShift release image and associated metadata.
Expand Down Expand Up @@ -780,11 +806,24 @@ type ConditionalUpdate struct {
// +required
Release Release `json:"release"`

// riskNames represents the set of the names of conditionalUpdateRisks
// in the status that are exposed to the release in this conditional update.
// The cluster-version operator will evaluate these risks and only
// accept the update if there is at least one risk and for every risk
// it is either not applied to the cluster or considered acceptable
// by the cluster administrator.
// +kubebuilder:validation:items:MaxLength=256
// +kubebuilder:validation:MaxItems=100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 100 here but 1000 elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places are total risks for all conditional updates.
This one is the risks associated for ONE conditional updates.

// +listType=set
// +optional
RiskNames []string `json:"riskNames"`

// risks represents the range of issues associated with
// updating to the target release. The cluster-version
// operator will evaluate all entries, and only recommend the
// update if there is at least one entry and all entries
// recommend the update.
// DEPRECATED: the risks has been deprecated by riskNames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for a user/clients?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It suggest a user who uses cv.status.conditionalUpdates.risks to use cv.status.conditionalUpdates.riskNames instead.

If other fields of cv.status.conditionalUpdates.risks than name are used, then it has to use the name as the key to get the whole object of an risk in cv.status.conditionalUpdateRisks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It suggest a user who uses cv.status.conditionalUpdates.risks to use cv.status.conditionalUpdates.riskNames instead.

Sure, but can I as a user still rely on this risks field being populated? What are the anticipated uses of this field?

I want to make sure that deprecating this field doesn't mean we are also breaking behaviors that users/clients might expect to be present.

// +kubebuilder:validation:MinItems=1
// +patchMergeKey=name
// +patchStrategy=merge
Expand All @@ -806,6 +845,15 @@ type ConditionalUpdate struct {
// for not recommending a conditional update.
// +k8s:deepcopy-gen=true
type ConditionalUpdateRisk struct {
// conditions represents the observations of the conditional update
// risk's current status. Known types are:
// * Applies, for whether the risk applies to the current cluster.
// +kubebuilder:validation:MaxItems=2
// +listType=map
// +listMapKey=type
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

// url contains information about this risk.
// +kubebuilder:validation:Format=uri
// +kubebuilder:validation:MinLength=1
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

26 changes: 25 additions & 1 deletion config/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading