Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add basic validation for time duration #11257

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Dhairya-Arora01
Copy link
Contributor

Basic validation for time duration can be done using kubebuilder markers to match the pattern "^(([1-9][0-9]*(\.[0-9]+)?|0?\.[0-9]+)(ns|us|µs|ms|s|m|h))+$",

What this PR does / why we need it: Basic validation for time duration can be done using kubebuilder markers to match the pattern "^(([1-9][0-9]*(\.[0-9]+)?|0?\.[0-9]+)(ns|us|µs|ms|s|m|h))+$"

Examples of valid matches:

  • 1s
  • 23ms
  • 0.123us
  • 456.789µs
  • 1m30s
  • 2h15m

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #7104

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 5, 2024
@k8s-ci-robot
Copy link
Contributor

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 5, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @Dhairya-Arora01. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign killianmuldoon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a simple test for common values?

Basic validation for time duration can be done using kubebuilder markers
to match the pattern "^(([1-9][0-9]*(\\.[0-9]+)?|0?\\.[0-9]+)(ns|us|µs|ms|s|m|h))+$",

Signed-off-by: Dhairya Arora <[email protected]>
@Dhairya-Arora01 Dhairya-Arora01 force-pushed the dhairya/durationValidation branch from 72a0a7c to 42e0fe9 Compare October 6, 2024 09:50
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 6, 2024
@Dhairya-Arora01
Copy link
Contributor Author

Dhairya-Arora01 commented Oct 6, 2024

Could we add a simple test for common values?

Also now I think, this validation should be everywhere we use time.Duration type. wdyt?
@vincepri how can i test this? like we can't just test it by populating the fields with these strings for the Cluster object.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these validations pre-existing in a webhook? What would happen if a user already has an invalid value?

Comment on lines +156 to +157
// +kubebuilder:validation:Pattern="^(([1-9][0-9]*(\\.[0-9]+)?|0?\\.[0-9]+)(ns|us|µs|ms|s|m|h))+$"
// +kubebuilder:validation:Type:=string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to use CEL here, you could create a custom error message which may be more helpful to end users, rather than just printing the regex as will happen when this fails, WDYT?

CEL pattern validation has been around since K8s 1.25 (beta originally) so I think all supported K8s versions CAPI targets should handle this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.28 is the min supported mgmt cluster version in CAPI 1.9.

Just have to make sure that whatever we do in CEL works with 1.28 (the Kubernetes CEL library is still adding new stuff with recent Kubernetes versions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this validation rule:

// +kubebuilder:validation:XValidation:rule="self.matches('^(([1-9][0-9]*(\\.[0-9]+)?|0?\\.[0-9]+)(ns|us|µs|ms|s|m|h))+$')",message="Invalid duration format. Must be a positive number followed by a unit (ns, us, µs, ms, s, m, h). Examples: 300ms, 2h45m, 1.5h"

this supports the error message too, but the pattern matching marker does not!

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 18, 2024

Should we have a discussion before going down the path of introducing a mix of open API and go validations?
Asking because as far as I remember we rely on go validations in a few places in the codebase, and changes like this could make those code path more fragile (disclaimer: I need some more time to reasearch usage of go validations, but I'm not sure when I could get to it)

/hold

@fabriziopandini fabriziopandini added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2024
@JoelSpeed
Copy link
Contributor

@fabriziopandini we should definitely have this discussion and try and come up with a plan forward, my 0.02 below.

In general, I think in the past, OpenAPI validation hasn't been up to scratch for the majority of our needs in terms of API validation. Modern CRD validation however, is much more powerful, and I think as a community we should start transitioning towards it more and more.

From an end user perspective, what we need to achieve is validation that presents at admission time. That could be implemented either as a webhook, or through openapi. In terms of validating the end user experience, a test suite that executes envtest and applies resources and inspects output errors should allow us to test both webhook validations and openapi at the same time.

Once we have this test suite in place, we could start moving validations from the webhook to the openapi validation to move more and more validation in-tree and reduce the usage of the webhook. This means less for us to maintain, and, when we can eventually use match conditions, we could even get to a point where the webhook is only called explicitly when it is needed. Which would further reduce usage of the webhook, and improve the runtime performance of our APIs, and hopefully improve the ability to scale management clusters.

OpenAPI validation inside the CRD schema can only look at the object we have in question, so anything that relies on validating based on fields from other CRs, requires something more. This can either be a webhook, or validating admission policy which can use params as input from other resources. This would be another way that we could reduce webhook validations by providing a set of VAPs that are installed alongside CRDs, and again, this has the benefit of being in-tree validation.

The other benefit of openapi based validation that I can think of is that it means that the CRD validations are at the point where the fields are defined. It's very easy to review the field context as a whole, you can see the godoc, and all of the validations applied to it in one small piece of code. This should help with comprehension of how the field behaves, and, IMO, helps us to improve the documentation. If I see a CEL rule that's not explained in the godoc, its easy to point out that this should be explained in the docs. If it's in a webhook, I'm less likely to pick up that it's not documented.

There are of course trade offs to consider, primarily for us, I think, is the minimum supported version of Kube. This validation has been and still is evolving. Any API testing we do, must be against our lowest supported version, and therefore, some of the newer features of CRD validation we may not be able to use just yet, meaning this journey, if we choose to embark upon it, might take some time.

@fabriziopandini
Copy link
Member

@JoelSpeed I agree with most of your points. I just want to make sure that we making this to happen in an organic way, taking care of testing, of not breaking where we are using defaulting/validation logic outside of the webhooks, etc.
Possibly, we need also a plan defining how much of this work should go in the next API version

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 12, 2025
@sbueringer
Copy link
Member

sbueringer commented Feb 14, 2025

Alrighty. Sorry for the delay, but I think we settled this debate now in: #6398 (comment)
/hold cancel
/ok-to-test

Min supported mgmt cluster version in CAPI v1.10 will be Kubernetes v1.29

This means we can rely on CEL (stable since v1.29), we can't rely on CRDValidationRatcheting (not stable as of now).

Are these validations pre-existing in a webhook? What would happen if a user already has an invalid value?

If we don't want to deny on pre-existing invalid values, this means we have to use CEL with a transition rule (#11257). In addition to that CEL allows us better error messages anyway.

The main effort will be writing the transition rule correctly and adding envtest based unit test coverage

@JoelSpeed Did I miss anything? :)

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 14, 2025
@k8s-ci-robot
Copy link
Contributor

@Dhairya-Arora01: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-blocking-main 42e0fe9 link true /test pull-cluster-api-e2e-blocking-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

I think you captured it.

In the future, we probably want to move to timeoutSeconds and avoid the use of durations altogether (see notes in API conventions about problems with duration validations)

@sbueringer
Copy link
Member

In the future, we probably want to move to timeoutSeconds and avoid the use of durations altogether (see notes in API conventions about problems with duration validations)

Is that this point?

Duration fields must be represented as integer fields with units being part of the field name (e.g. leaseDurationSeconds). We don't use Duration in the API since that would require clients to implement go-compatible parsing.

Is the idea behind that, that other clients that read the field values - apart from our controllers - would have to implement go-compatible parsing of duration values?

@JoelSpeed
Copy link
Contributor

Is the idea behind that, that other clients that read the field values - apart from our controllers - would have to implement go-compatible parsing of duration values?

Yep, exactly. So in theory, anyone who works with this API and uses Go, is fine, as they'd be able to use the Go duration parsing to work with the field value. But if they happen to use another language, eg Python, then they would have to work out how to parse the duration string in a compatible way, which is non-trivial. So by removing the need to understand the units within the string (h|m|s|ns|...), you can make it easier for non-Go clients

In this particular use case, I would expect all of these timeouts to be in the order of seconds, so it shouldn't be too hard of a lift to be able to make that change

@sbueringer
Copy link
Member

sbueringer commented Feb 17, 2025

Yup agree. I think we also ~ only have 4 fields with Duration (duplicated in a few places):

  • MHC: NodeStartupTimeout / UnhealthyCondition.Timeout
  • NodeDrainTimeout / NodeVolumeDetachTimeout / NodeDeletionTimeout

Fine from my side to change them to *Seconds integer in v1beta2. I think then I wouldn't invest additional effort now to implement new validation for Duration.

@fabriziopandini @chrischdi Fine from your side as well?

@JoelSpeed Would be probably also something for KAL?

@JoelSpeed
Copy link
Contributor

Already tracking in JoelSpeed/kal#24, but not yet implemented. Should be pretty easy given the pattern established in other NoFoo style linters

@chrischdi
Copy link
Member

Yup agree. I think we also ~ only have 4 fields with Duration (duplicated in a few places):

  • MHC: NodeStartupTimeout / UnhealthyCondition.Timeout
  • NodeDrainTimeout / NodeVolumeDetachTimeout / NodeDeletionTimeout

Fine from my side to change them to *Seconds integer in v1beta2. I think then I wouldn't invest additional effort now to implement new validation for Duration.

@fabriziopandini @chrischdi Fine from your side as well?

@JoelSpeed Would be probably also something for KAL?

Fine from my side too :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Webhook validation for Topology NodeDeletionTimeout and NodeDrainTimeout
8 participants