Skip to content

[WIP] MCO-1669: add BootImageSkewEnforcement API #2357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Jun 5, 2025

WIP boot image enforcement API, based on discussions from openshift/enhancements#1761

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 5, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 5, 2025

@djoshy: This pull request references MCO-1669 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

WIP boot image enforcement API, based on discussions from openshift/enhancements#1761

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jun 5, 2025

Hello @djoshy! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 5, 2025
@openshift-ci openshift-ci bot requested review from deads2k and everettraven June 5, 2025 16:51
Copy link
Contributor

openshift-ci bot commented Jun 5, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: djoshy
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Jun 5, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp eee6809 link false /test e2e-gcp
ci/prow/e2e-aws-serial-techpreview-2of2 eee6809 link true /test e2e-aws-serial-techpreview-2of2
ci/prow/verify-crd-schema eee6809 link true /test verify-crd-schema
ci/prow/e2e-aws-ovn-hypershift-conformance eee6809 link true /test e2e-aws-ovn-hypershift-conformance

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -56,8 +56,66 @@ type MachineConfigurationSpec struct {
// +openshift:enable:FeatureGate=NodeDisruptionPolicy
// +optional
NodeDisruptionPolicy NodeDisruptionPolicyConfig `json:"nodeDisruptionPolicy"`
// bootImageSkewEnforcement allows an admin to set the behavior of the boot image skew enforcement mechanism.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does an admin care about configuring this? What does configuring this allow them to achieve?


// +kubebuilder:validation:XValidation:rule="has(self.mode) && (self.mode == 'Automatic' || self.mode =='Manual') ? has(self.clusterBootImage) : !has(self.clusterBootImage)",message="clusterBootImage is required when type is Automatic or Manual, and forbidden otherwise"
// +union
type SkewEnforcementSelector struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Being more explicit in the type name helps eliminate potential for future conflicts in naming of similar features within the same package. Also makes it a bit more clear from the dev perspective that this type is explicitly used for boot image skew enforcement.

Suggested change
type SkewEnforcementSelector struct {
type BootImageSkewEnforcementSelector struct {
Suggested change
type SkewEnforcementSelector struct {
type SkewEnforcementSelector struct {

// associated with the last boot image update in the clusterBootImage field.
// In Automatic and Manual mode, the MCO will prevent upgrades when the boot image skew exceeds the
// skew limit described by the release image.
// Disabled means that the MCO will permit upgrades when the boot image exceeds the skew limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In general we try to avoid using the term Disabled because it is often overloaded to mean different things. I don't think this usage is particularly concerning, but maybe something like None is more intuitive for a field like this?

Comment on lines +110 to +116
Automatic SkewEnforcementSelectorMode = "Automatic"

// Manual represents a configuration mode that allows manual skew enforcement.
Manual SkewEnforcementSelectorMode = "Manual"

// Disabled represents a configuration mode that disables boot image skew enforcement.
Disabled SkewEnforcementSelectorMode = "Disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prevent naming conflicts and maintain consistency with how we define constants within OpenShift by following the format {typeAlias}{name}:

Suggested change
Automatic SkewEnforcementSelectorMode = "Automatic"
// Manual represents a configuration mode that allows manual skew enforcement.
Manual SkewEnforcementSelectorMode = "Manual"
// Disabled represents a configuration mode that disables boot image skew enforcement.
Disabled SkewEnforcementSelectorMode = "Disabled"
SkewEnforcementSelectorModeAutomatic SkewEnforcementSelectorMode = "Automatic"
// Manual represents a configuration mode that allows manual skew enforcement.
SkewEnforcementSelectorModeManual SkewEnforcementSelectorMode = "Manual"
// Disabled represents a configuration mode that disables boot image skew enforcement.
SkewEnforcementSelectorModeDisabled SkewEnforcementSelectorMode = "Disabled"

// Disabled means that the MCO will permit upgrades when the boot image exceeds the skew limit
// described by the release image. This may affect the cluster's ability to scale.
// +unionDiscriminator
// +required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly mention in the GoDoc that this field is required

type ClusterBootImage struct {
// ocpVersion provides a string which represents the OCP version of the boot image
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]+$')",message="bootImageOCPVersion must match the OCP semver compatible format of x.y.z"
// +kubebuilder:validation:MaxLength:=8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why was 8 chosen?

// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]+$')",message="bootImageOCPVersion must match the OCP semver compatible format of x.y.z"
// +kubebuilder:validation:MaxLength:=8
// +required
OCPVersion string `json:"ocpVersion"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you wanting to allow the empty string "" as a valid value here? If not, you'll probably want a minimum length here to prevent the empty string being valid.

If you do, what does this being an empty string mean?

Comment on lines +98 to +100
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]{8}-[0-9]+$')",message="rhcosVersion must match format [major].[minor].[datestamp(YYYYMMDD)]-[buildnumber]"
// +kubebuilder:validation:MaxLength:=15
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly mention these constraints in the godoc.


// rhcosVersion provides a string which represents the RHCOS version of the boot image
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]{8}-[0-9]+$')",message="rhcosVersion must match format [major].[minor].[datestamp(YYYYMMDD)]-[buildnumber]"
// +kubebuilder:validation:MaxLength:=15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 15? is this long enough?

Looking at https://access.redhat.com/solutions/3787021 it looks like something like 416.94.202411201433-0 would be valid and is 21 characters long.

}

// ClusterBootImage describes the boot image of a cluster. It stores the RHCOS version of the boot image and
// the OCP release version which shipped with that RHCOS boot image.
type ClusterBootImage struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between ocpVersion and rhcosVersion here? What happens if I do/don't include rhcosVersion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants