Skip to content

OCPEDGE-1740: feat: add arbiter role support to ABI #9780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eggfoobar
Copy link
Contributor

Assisted service now supports the arbiter node, most of the folks looking to use the arbiter node have a low tolerance for extra compute nodes, as such a bootstrap node is not ideal. This PR adds support for the arbiter node to the ABI flow using the latest assisted service additions.

Added support for arbiter installs to ABI flow, we currently do not support installing TechPreview featureSet with agent based install so a temporary override was added for arbiter due to customers who have limitations on running a bootstrap node and will need to run TechPreview installs.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 11, 2025

@eggfoobar: This pull request references OCPEDGE-1740 which is a valid jira issue.

In response to this:

Assisted service now supports the arbiter node, most of the folks looking to use the arbiter node have a low tolerance for extra compute nodes, as such a bootstrap node is not ideal. This PR adds support for the arbiter node to the ABI flow using the latest assisted service additions.

Added support for arbiter installs to ABI flow, we currently do not support installing TechPreview featureSet with agent based install so a temporary override was added for arbiter due to customers who have limitations on running a bootstrap node and will need to run TechPreview installs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -88,14 +88,15 @@ func (a *OptionalInstallConfig) validateInstallConfig(ctx context.Context, insta
allErrs = append(allErrs, err...)
}

if installConfig.FeatureSet != configv1.Default {
// TODO: Remove arbiter check once arbiter hits GA.
if installConfig.FeatureSet != configv1.Default && !installConfig.IsArbiterEnabled() {
Copy link
Contributor Author

@eggfoobar eggfoobar Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @zaneb and @rwsu, I know we had briefly mentioned resolving this check (https://issues.redhat.com/browse/AGENT-554), but I couldn't find where the agent flow would be using this to confidently remove this check, currently the assisted service does not utilize it for arbiter and is adding the CustomNoUpgrade featureSet on their end https://github.com/giladravid16/assisted-service/blob/master/internal/installcfg/builder/builder.go#L153

Any concerns with this quick approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK (though it would be better to have AGENT-554 implemented). Do we already have a validation that if IsArbiterEnabled() is true then the FeatureSet must be CustomNoUpgrade? (Presumably yes; that's why assisted-service is setting it. So I guess the real question is if that validation runs in the ABI path.)

FWIW the thing that needs to happen for AGENT-554 is to add the FeatureSet to the install-config overrides here, which wouldn't be difficult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if it's just that, then I went ahead and made the update, it should match correctly with the de-serielizaed step on the assisted-service install cfg struct

@eggfoobar
Copy link
Contributor Author

/retest-required

@eggfoobar
Copy link
Contributor Author

/retest

@@ -40,6 +40,22 @@ num_known_hosts() {
done
echo "Hosts known and ready for cluster installation (${known_hosts}/${total_required_nodes})" 1>&2
fi
if (( known_hosts > total_required_nodes )) && [[ "${TNA_CLUSTERS_SUPPORT}" == "true" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding a REQUIRED_ARBITER_NODES variable to https://github.com/openshift/installer/blob/main/data/data/agent/files/usr/local/share/start-cluster/start-cluster.env.template. That could simplify or eliminate this code block. The required master and workers are calculated here and passed to the template:

numMasters = agentManifests.AgentClusterInstall.Spec.ProvisionRequirements.ControlPlaneAgents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure thing, way better approach, I got a PR open here openshift/assisted-service#7765

Copy link
Contributor Author

@eggfoobar eggfoobar Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the change for now, until that PR is merged in, then I'll update the go.mod file

@bfournie
Copy link
Contributor

Per Richard's comment, would also need to add a validation to pkg/asset/agent/manifests/agentclusterinstall.go when the number of arbiters is added to AgentClusterInstall.

@eggfoobar eggfoobar force-pushed the support-arbiter-agent-based branch from 80974b4 to f997529 Compare June 16, 2025 13:54
@eggfoobar
Copy link
Contributor Author

/hold

Holding because I hard coded modified the vendor agent cluster install config to test out install until openshift/assisted-service#7765 gets merged in.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2025
@eggfoobar eggfoobar force-pushed the support-arbiter-agent-based branch from f997529 to 36f8417 Compare June 16, 2025 19:01
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2025
Copy link
Contributor

openshift-ci bot commented Jun 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rwsu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

added support for arbiter installs to ABI flow, we currently do not
support installing TechPreview featureSet with agent based install, this
includes adding that capability for overriding featureSet to be passed
to the assisted service.

Signed-off-by: ehila <[email protected]>
added arbiter count passing through annotation for testing, will remove before merge

Signed-off-by: ehila <[email protected]>
@eggfoobar eggfoobar force-pushed the support-arbiter-agent-based branch from 61d16fd to f58624d Compare June 17, 2025 22:57
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2025
Copy link
Contributor

openshift-ci bot commented Jun 18, 2025

@eggfoobar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 36f8417 link true /test e2e-aws-ovn
ci/prow/e2e-vsphere-ovn-multi-network d7c6898 link false /test e2e-vsphere-ovn-multi-network
ci/prow/integration-tests-nodejoiner d7c6898 link true /test integration-tests-nodejoiner
ci/prow/okd-scos-e2e-aws-ovn d7c6898 link false /test okd-scos-e2e-aws-ovn
ci/prow/golint d7c6898 link true /test golint
ci/prow/unit d7c6898 link true /test unit
ci/prow/aro-unit d7c6898 link true /test aro-unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants