Skip to content

OTA-1010: extract included manifests with net-new capabilities #1958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Jan 14, 2025

The ManifestInclusionConfiguration is used to
determine is a manifest is included on a cluster.
Its Capabilities field takes the implicitly enabled
capabilities into account.

This change removes the workaround that handles the
net-new capabilities introduced by a cluster upgrade.
E.g. if a cluster is currently with 4.13, then it
assumes that the capabilities "build",
"deploymentConfig", and "ImageRegistry" are enabled.
This is because the components underlying those
capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those
capabilities will become enabled after upgrade from
4.13 to 4.14: either explicitly or implicitly
depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

CVO has already defined the function
GetImplicitlyEnabledCapabilities that calculates
the implicitly enabled capabilities of a cluster
after a cluster upgrade. For this function to work,
we have to provide

  • the manifests that are currently included on the
    cluster.
  • the manifests from the payload in the upgrade image.

The existing ManifestReceiver is enhanced in a way
that it can provide enabled capabilities, including
both explicit and implicit ones, when the callback to
downstream is called. It is implemented by a cache to
collect manifests from the upstream and calls
downstream only when all manifests are collected and
the capabilities are calculated with them using the
function GetImplicitlyEnabledCapabilities mentioned
earlier.

This enhancement can be opted in by setting up the
needEnabledCapabilities field of ManifestReceiver.
Otherwise, its behaviours stays the same as before.

In case that the inclusion configuration is taken
from the cluster, i.e., --install-config is not set,
needEnabledCapabilities is set to true.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 14, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2025
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 5 times, most recently from ad75be6 to 38aeb1d Compare January 14, 2025 12:09
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This pull adds a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from 69216c5 to 916427e Compare January 14, 2025 12:18
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Before this pull, we enabled three 3 net-new capabilities for 4.13 clusters:

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

Now the capabilities for the incoming release is calculated with the function from CVO based on the manifests from the current release and the ones from the incoming release.

To fit the current code that had TarEntryCallback already, the above logic is implemented via a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 916427e to 27e03eb Compare January 14, 2025 23:38
@hongkailiu hongkailiu changed the title [wip]OTA-1010: extract included manifests with net-new capabilities OTA-1010: extract included manifests with net-new capabilities Jan 15, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2025
@hongkailiu
Copy link
Member Author

/retest-required

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller February 14, 2025 18:34
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 4 times, most recently from c1909d5 to 0431ce4 Compare March 4, 2025 16:50
@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

hongkailiu commented Mar 4, 2025

Some testing result from 0431ce4 (outdated)

Cluster-bot:

launch 4.13.12 aws

$ make oc
$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0304 13:36:44.744021   32452 extract_tools.go:1254] Those capabilities become implicitly enabled for the incoming release [ImageRegistry MachineAPI]
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

$ ll credentials-requests
total 48
-rw-r--r--@ 1 hongkliu  staff   1.8K Mar  4 13:36 0000_30_machine-api-operator_00_credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   738B Mar  4 13:36 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
-rw-r--r--@ 1 hongkliu  staff   1.3K Mar  4 13:36 0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   920B Mar  4 13:36 0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   1.0K Mar  4 13:36 0000_50_cluster-network-operator_02-cncc-credentials.yaml
-rw-r--r--@ 1 hongkliu  staff   1.5K Mar  4 13:36 0000_50_cluster-storage-operator_03_credentials_request_aws.yaml

### without --included
$ rm -rf credentials-requests          
$ ./oc adm release extract --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64                                  
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ ll credentials-requests | wc -l
     682

@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

/test e2e-agnostic-ovn-cmd

@petr-muller
Copy link
Member

petr-muller commented Mar 11, 2025

/uncc

I am not paying attention OTA-1010 matter that much b/c afaik Trevor is involved in this, so I uncc myself to avoid giving false impression that I plan to review here. If my review or approval is necessary, feel free to /cc me again.

@openshift-ci openshift-ci bot removed the request for review from petr-muller March 11, 2025 12:48
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from 99cda00 to def90a6 Compare March 14, 2025 05:57
@hongkailiu
Copy link
Member Author

hongkailiu commented Mar 17, 2025

Rerun the test with def90a6:

launch 4.13.12 aws

The cluster did not set BASELINE_CAPABILITY_SET. So it is the default value vCurrent.

$ oc get clusterversions.config.openshift.io version -o yaml | yq .spec
{
  "channel": "candidate-4.13",
  "clusterID": "01541c49-a5d6-4b02-ba6a-161a12c3ca79",
  "upstream": "https://api.integration.openshift.com/api/upgrades_info/graph"
}

$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64

I0314 09:39:57.809034   23241 extract_tools.go:1340] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the actual vCurrent capability set may differ.
I0314 09:39:57.809062   23241 extract_tools.go:1343] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the known capability sets may differ.
I0314 09:40:11.867239   23241 extract_tools.go:1253] Those capabilities become implicitly enabled for the incoming release []
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

$ ll credentials-requests
total 48
-rw-r--r--@ 1 hongkliu  staff   1.8K Mar 14 09:40 0000_30_machine-api-operator_00_credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   738B Mar 14 09:40 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
-rw-r--r--@ 1 hongkliu  staff   1.3K Mar 14 09:40 0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   920B Mar 14 09:40 0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   1.0K Mar 14 09:40 0000_50_cluster-network-operator_02-cncc-credentials.yaml
-rw-r--r--@ 1 hongkliu  staff   1.5K Mar 14 09:40 0000_50_cluster-storage-operator_03_credentials_request_aws.yaml

Comparing with #1958 (comment), no caps became implicitly enabled as expected. Because they would be (explicitly) enabled with BASELINE_CAPABILITY_SET=vCurrent.

I wanted to try BASELINE_CAPABILITY_SET=None with

launch 4.13.12 aws,no-capabilities

But cluster-bot is not so happy with the command.

I expect to see some implicitly enabled caps there.


Update on May 20.
Thanks to @jiajliu for the magic that creates a 4.13.12 cluster with baselineCapabilitySet=None
Create openshift/release#63030 then trigger the rehearsal by

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-multi-nightly-4.14-upgrade-from-stable-4.13-aws-upi-basecap-none-amd-f28

When the rehearsal job came to

INFO[2025-03-20T16:04:13Z] Running step aws-upi-basecap-none-amd-f28-wait. 

Then login to the build farm to get the kubeconfig to the ephemeral cluster:

$ oc -n ci-op-76cqklsf extract secret/aws-upi-basecap-none-amd-f28 --to=- --keys kubeconfig > ~/.kube/config

Repeat the above test:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.12   True        False         34m     Cluster version is 4.13.12

$ oc get clusterversions.config.openshift.io version -o yaml | yq .spec
{
  "capabilities": {
    "baselineCapabilitySet": "None"
  },
  "clusterID": "a6bbd42d-eb66-44b6-89de-ca2be2985fd5"
}

$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0320 12:26:25.194094    7394 extract_tools.go:1343] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the known capability sets may differ.
I0320 12:27:04.242248    7394 extract_tools.go:1253] Those capabilities become implicitly enabled for the incoming release [ImageRegistry MachineAPI]
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

The logs look good to me.

@hongkailiu
Copy link
Member Author

/test e2e-agnostic-ovn-cmd

Copy link
Contributor

openshift-ci bot commented Mar 17, 2025

@hongkailiu: No presubmit jobs available for openshift/oc@master

In response to this:

/test e2e-agnostic-ovn-cmd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 0be8d41 to 88d0075 Compare March 21, 2025 14:33
@hongkailiu hongkailiu requested a review from wking March 21, 2025 14:53
@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller May 6, 2025 12:53
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only went through the pkg/cli/admin/release/extract.go diff and left some comments. The code is very hard to read, which is not your fault - it is mostly caused by constructing a series of very long anonymous callbacks that end up being called at whatever time later... Not sure what to do about it though. It would definitely help if this was a series of smaller PRs.

if c := imageConfig.Config; c != nil {
if v, ok := c.Labels["io.openshift.release"]; ok {
klog.V(2).Infof("Retrieved the version from image configuration in the image to extract: %s", v)
versionInImageConfig = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this callback be called multiple times, overwriting previous values in `versionInImageConfig``? If yes, can the callback be called in parallel, which would be a race?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, oc adm release extract has only one release to extract and one image means one image config.

From my test, it is called only once.

Copy link
Member

@petr-muller petr-muller May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we encode that assumption and blow up with a panic or Fatal if versionInImageConfig is already set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has been moved to https://github.com/openshift/oc/pull/2050/files.
But I will keep this open because I did not address this comment and you might still think I should do it in this pull.

The concern is valid in general but it is unlikely to happen here.
The other callbacks of extract.ExtractOptions are not multi-thread safe either.

@petr-muller
Copy link
Member

It may be beyond the scope of what this PR attempts to do, but I have a feeling that the code could be made more readable if some of callbacks (that are currently lambdas using various option struct members and closures on the surrounding scope variables) were extracted into a dedicated, named and documented single-purpose types with methods that would be used as the callbacks, with a constructor that makes the callback inputs a specified interface.

@hongkailiu
Copy link
Member Author

It may be beyond the scope of what this PR attempts to do, but I have a feeling that the code could be made more readable if some of callbacks (that are currently lambdas using various option struct members and closures on the surrounding scope variables) were extracted into a dedicated, named and documented single-purpose types with methods that would be used as the callbacks, with a constructor that makes the callback inputs a specified interface.

I will give this a try.
(Currently I have another series of pulls to merge. They are also opened long time ago and made some progress recently. I think I can close them quickly. I will come back to this one right after).

There are two files `image-references`, and
 `release-metadata` that are handled differently from
 manifest files. When those files come, their readers
 from the upstream are sent to the downstream callback
 right away.

Other files contain manifests. They are parsed out
and then sent to the downstream. We will embed more
changes into this part, e.g., collecting all manifests
in the image and then use them to calculate the
enabled capabilities which is sent as an argument to
the downstream callback. Those changes are coming in
other pulls.
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 88d0075 to 3aacfdd Compare June 25, 2025 19:18
Copy link
Contributor

openshift-ci bot commented Jun 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from dae1d70 to d95f37c Compare June 25, 2025 19:25
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

The ManifestInclusionConfiguration is used to
determine is a manifest is included on a cluster.
Its Capabilities field takes the implicitly enabled
capabilities into account.

This change removes the workaround that handles the
net-new capabilities introduced by a cluster upgrade.
E.g. if a cluster is currently with 4.13, then it
assumes that the capabilities "build",
"deploymentConfig", and "ImageRegistry" are enabled.
This is because the components underlying those
capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those
capabilities will become enabled after upgrade from
4.13 to 4.14: either explicitly or implicitly
depending on the current value of
cv.spec.capabilities.baselineCapabilitySet.

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

CVO has already defined the function
GetImplicitlyEnabledCapabilities that calculates
the implicitly enabled capabilities of a cluster
after a cluster upgrade. For this function to work,
we have to provide

  • the manifests that are currently included on the
    cluster.
  • the manifests from the payload in the upgrade image.

The existing ManifestReceiver is enhanced in a way
that it can provide enabled capabilities, including
both explicit and implicit ones, when the callback to
downstream is called. It is implemented by a cache to
collect manifests from the upstream and calls
downstream only when all manifests are collected and
the capabilities are calculated with them using the
function GetImplicitlyEnabledCapabilities mentioned
earlier.

This enhancement can be opted in by setting up the
needEnabledCapabilities field of ManifestReceiver.
Otherwise, its behaviours stays the same as before.

In case that the inclusion configuration is taken
from the cluster, i.e., --install-config is not set,
needEnabledCapabilities is set to true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member Author

/hold

Will rebase after #2048 and #2050 get in

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2025
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from d95f37c to 1f0a027 Compare June 25, 2025 19:53
Before this full, the logging was only for the case
that `findClusterIncludeConfigFromInstallConfig` is
called, i.e., the path from an install-config
file is provided.

This pull extends it to the case where the
configuration is taken from the current cluster.

Another change from the pull is that the logging
messages include the target version that is determined
by inspecting the release image. The implementation
for this is adding a new callback `ImageConfigCallback`.
The `ManifestInclusionConfiguration` is used to
determine is a manifest is included on a cluster.
Its `Capabilities` field takes the implicitly enabled
capabilities into account.

This change removes the workaround that handles the
net-new capabilities introduced by a cluster upgrade.
E.g. if a cluster is currently with 4.13, then it
assumes that the capabilities "build",
"deploymentConfig", and "ImageRegistry" are enabled.
This is because the components underlying those
capabilities are installed by default on 4.13, or
earlier and cannot be disabled once installed. Those
capabilities will become enabled after upgrade from
4.13 to 4.14: either explicitly or implicitly
depending on the current value of
`cv.spec.capabilities.baselineCapabilitySet`.

https://github.com/openshift/oc/blob/e005223acd7c478bac070134c16f5533a258be12/pkg/cli/admin/release/extract_tools.go#L1241-L1252

CVO has already defined the function
GetImplicitlyEnabledCapabilities that calculates
the implicitly enabled capabilities of a cluster
after a cluster upgrade. For this function to work,
we have to provide

* the manifests that are currently included on the
  cluster.
* the manifests from the payload in the upgrade image.

The existing `ManifestReceiver` is enhanced in a way
that it can provide enabled capabilities, including
both explicit and implicit ones, when the callback to
downstream is called. It is implemented by a cache to
collect manifests from the upstream and calls
downstream only when all manifests are collected and
the capabilities are calculated with them using the
function `GetImplicitlyEnabledCapabilities` mentioned
earlier.

This enhancement can be opted in by setting up the
`needEnabledCapabilities` field of `ManifestReceiver`.
Otherwise, its behaviours stays the same as before.

In case that the inclusion configuration is taken
from the cluster, i.e., `--install-config` is not set,
`needEnabledCapabilities` is set to `true`.
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 1f0a027 to 76338a7 Compare June 25, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants