Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Improving status in CAPI resources - Phase 2 #11474

Open
3 of 40 tasks
fabriziopandini opened this issue Nov 25, 2024 · 2 comments
Open
3 of 40 tasks
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 25, 2024

This is a tracking issue for activities related to #10897
Phase 1 activities are tracked into #11105

Follow up from phase 1 (1.10)

Looking for volunteers (like everything else MachinePools)

Backlog (if and when we have bandwidth

  • Drop fake objects from controller tests and use the test/util/builders package

Dropped

  • Check if NodeNetworkUnavailable must be added to MachineNodeHealthyCondition
    • Looks like NodeNetworkUnavailable is a left over of in tree gcp provider
  • Allow customisation of remote-conditions-grace-period at cluster level
  • InMemoryCluster and InMemoryMachine controllers
    • It is being merged in CAPD(ev)

Implementation Phase2 (1.11)

Initial notes for phase 2 of the implementation, where new field are promoted top level in status and used by controllers

  • v1beta2 API (not strictly related to this issue)
    • Prepare main branch
    • Create v1beta2 types, make controllers/webhooks to use it
      • Update example templates
    • Change clusterctl & E2E tests & test extensions to use v1beta2
    • Consider if to step up test coverage where API version is relevant (e.g. CC, topology controller, Runtime Extension)
    • Update doc about current version
  • API changes for status
    • Introduce deprecated struct, move v1beta2 top level
    • Rename condition interfaces and method implementations (current--> deprecated, v1beta2--> current)
    • Rename condition consts (current--> deprecated, v1beta2--> current)
    • Move conditions fields as the first field in the struct
    • Add minReadySeconds to machine, remove it from KCP. MD
  • Fine tuning of print columns
  • Change contracts
    • When reading from external objects, read new fields, fallback to old ones
    • Update contract docs
  • Use new fields
    • (every controller that use conditions or counters)
    • MachineHealthCheck controller
      • Change test for ControlPlaneInitializedCondition in reconcile
      • Get MachineHealthCheckSucceededCondition in patchUnhealthyTargets
      • Drop failureReson/failureMessage test from needsRemediation
      • Change test for ControlPlaneInitializedCondition, InfrastructureReadyCondition in needsRemediation (few places)
    • KCP controller
      • Change test for MachineEtcdMemberHealthyCondition in canSafelyRemoveEtcdMember + add a test for ClusterHealthy if not already there
      • We should use the Initialized condition instead of the Available condition to calculate remoteConditionsGracePeriod
  • Condition utils
    • Rename packages for condition utils (conditions-->deprecated conditions, v1beta2conditions --> conditions)
  • Clusterctl
    • clusterctl describe (drop v1beta2 flag, introduce deprecated flag, switch default behaviour in the implementation)
  • Change metrics & dashboards to use conditions

Implementation Phase3

TBD

Backlog / unprioritized

  • Scale testing with CAPD (check for condition flakiness)
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2024
@fabriziopandini fabriziopandini added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Nov 25, 2024
@k8s-ci-robot k8s-ci-robot removed needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2024
@sbueringer sbueringer added this to the v1.10 milestone Dec 27, 2024
@sbueringer
Copy link
Member

Assigned to v1.10 for now to keep an eye on it. I think it's not clear yet in which release this will land.

@mbrow137 mbrow137 moved this to 🏗 In Progress in CAPI 1.10 Release Feb 5, 2025
@fabriziopandini
Copy link
Member Author

WRT to make new API fields kubectl explain compliant (rif #10897 (comment))

New fields introduced by the improve status proposal doesn't have fancy formatting anymore; never the less, it seems that this issue has been addressed in latest version of controller tools:

field with list using *

  template      <string>
    template defines the template to use for generating the name of the
    ControlPlane object.
    If not defined, it will fallback to `{{ .cluster.name }}-{{ .random }}`.
    If the templated string exceeds 63 characters, it will be trimmed to 58
    characters and will
    get concatenated with a random suffix of length 5.
    The templating mechanism provides the following arguments:
    * `.cluster.name`: The name of the cluster object.
    * `.random`: A random alphanumeric string, without vowels, of length 5.

field with list using -

  nodeStartupTimeout    <string>
    nodeStartupTimeout allows to set the maximum time for MachineHealthCheck
    to consider a Machine unhealthy if a corresponding Node isn't associated
    through a `Spec.ProviderID` field.

    The duration set in this field is compared to the greatest of:
    - Cluster's infrastructure ready condition timestamp (if and when available)
    - Control Plane's initialized condition timestamp (if and when available)
    - Machine's infrastructure ready condition timestamp (if and when available)
    - Machine's metadata creation timestamp

    Defaults to 10 minutes.
    If you wish to disable this feature, set the value explicitly to 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants