Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPI logs filled with error messages if no machine deployments/pools exist and ControlPlane does not implement v1beta2 conditions #11820

Open
jakefhyde opened this issue Feb 7, 2025 · 5 comments
Labels
area/conditions Issues or PRs related to conditions help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jakefhyde
Copy link
Contributor

jakefhyde commented Feb 7, 2025

What steps did you take and what happened?

Clusters that do not use machine deployments or machine pools (for example, configuring a cluster with machines manually) will cause the capi-controller-manager to endlessly write error logs whenever the cluster status is updated. The capi-controller-manager will show the following logs:

E0207 14:31:48.116038       1 cluster_controller_status.go:838] "Failed to aggregate ControlPlane, MachinePool, MachineDeployment's RollingOut conditions" err="sourceObjs can't be empty" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="<namespace/cluster>" namespace="<namespace>" name="<name>" reconcileID="71e43209-4e27-41d9-8470-f1c1c33901c1"
E0207 14:31:48.116068       1 cluster_controller_status.go:915] "Failed to aggregate ControlPlane, MachinePool, MachineDeployment, MachineSet's ScalingUp conditions" err="sourceObjs can't be empty" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="<namespace/cluster>" namespace="<namespace>" name="<name>" reconcileID="71e43209-4e27-41d9-8470-f1c1c33901c1"
E0207 14:31:48.116081       1 cluster_controller_status.go:992] "Failed to aggregate ControlPlane, MachinePool, MachineDeployment, MachineSet's ScalingDown conditions" err="sourceObjs can't be empty" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="<namespace/cluster>" namespace="<namespace>" name="<name>" reconcileID="71e43209-4e27-41d9-8470-f1c1c33901c1"

The requisite code can be found here:

We are able to workaround this by setting the conditions to false so that they are present.

What did you expect to happen?

My expectation is that during the v1beta1 -> v1beta2 migration, the status is set to Unknown if the aggregate conditions are not present.

Cluster API version

v1.9.4

Kubernetes version

v1.30.2

Anything else you would like to add?

We implement a bring your own host style provisioning (not related to the byoh provider) in which users can register nodes freely, leaving lifecycle management to the user. Although a less common provisioning model, I imagine this could also affect clusters with control plane providers which have not been updated that also provision machines via manual definition.

Label(s) to be applied

/kind bug
/area conditions

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/conditions Issues or PRs related to conditions needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 7, 2025
@jakefhyde jakefhyde changed the title Cluster fails to provision if no machine deployments/pools exist and ControlPlane does not implement v1beta2 conditions CAPI logs filled with error messages if no machine deployments/pools exist and ControlPlane does not implement v1beta2 conditions Feb 7, 2025
@jakefhyde
Copy link
Contributor Author

Apologies for the rename, there was a little confusion on my end. We have a test case where the etcd plane is scaled to 0, a new etcd machine is created, and we perform an etcd restore on top of that. These logs were being printed endlessly, so I had erroneously assumed they were related. Although we skip draining with the machine.cluster.x-k8s.io/exclude-node-draining annotation, we weren't doing the same for volume detachment. I think this was fine previously purely by happy accident, and the addition of alwaysReconcile helped expose this race condition.

That all being said, would it be possible to lower the log level for those messages until v1beta2 goes live? I'm content to just leave the conditions on there for now, otherwise it fills the logs and makes debugging quite difficult.

@chrischdi
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. labels Feb 19, 2025
@chrischdi
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@chrischdi:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 19, 2025
@fabriziopandini
Copy link
Member

Note, I will try to take a look at this in the context of the work I'm doing for #11474, but I'm not sure if and when I will get to this (if someone else wants to take care of this before me, feel free to do it!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/conditions Issues or PRs related to conditions help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants