-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhook with a timeoutSeconds greater than 29 seconds will block upgrades? #167
Comments
I just tried changing
This is leading me to believe 30 should really be a valid value. Happy to make a PR if folks agree. |
@maelvls provided some additional details on the issue I logged against cert-manager:
That combined with my experimentation above makes it pretty clear that 30 seconds is actually a valid value, and clusterlint should allow it. |
I vaguely remember there was a "conflict" between 30 seconds used by a webhook and the default 30 seconds timeout of kubectl: if a webhook used a 30 seconds timeout, the default timeout from kubectl could "override" the webhook one, rendering failure policies of @varshavaradarajan does that make sense, or am I completely off here? |
Hi - could you please let us know the reason why the upper limit of this value has been set to 29? Seeing the above mentioned issue on both our Digital Ocean k8s clusters which run the cert manager helm chart. Thanks, Charlie |
It sounds like the reason is almost lost to the dust of time. If I may, I recommend following shellcheck's model: not only point out problems, but write quick blurbs describing WHY it's a problem, and ways to fix it. That will be helpful not only to clusterlint users, but clusterlint maintainers over time. |
There should be documentation on every check, but as usual it can probably be improved. I am reasonably sure that what I wrote at #167 (comment) about the upper limit is true. Should be fairly easy to verify and afterwards improve / adjust the documentation one way or another. |
@kyrofa I contacted Digital Ocean support as I am a customer and this is their response/reasoning: I completely understand your concern here. I will be happy to assist you. We recommend ensuring the timeout is less than 30s. This is because 30s is the default timeout for Kubernetes objects and it is possible to create a scenario that lines up perfectly with other timeouts which would allow the timeout to fail indefinitely. For more details refer below link: https://docs.digitalocean.com/support/clusterlint-error-fixes/#admission-controller-webhook-timeout For example, if a node hosting the webhook goes down, then webhook will not respond to requests. This is problematic when nodes try and register to the master API and the master API cannot reach the webhook. The node registration fails (because the failurePolicy is set to 'Fail') while waiting on the webhook to become available. But the webhook often can't reschedule till a new node has been created. Thus creating a bit of a circular dependency scenario. |
This matches what Digital Ocean support have told me 👍 |
I'm trying to upgrade from dok8s v1.28 to v1.29, but clusterlint is telling me that cert-manager's webhook, which uses a
timeoutSeconds
of 30, will block the upgrade. Would someone mind filling me in on why that's the case suddenly? I've been using cert-manager for years, across several k8s versions, without issue. I see in the upstream docs where it says:But it's unclear whether that's inclusive or not. Clusterlint has seemingly decided that it's 1-inclusive and 30-exclusive. Is that actually based in reality? Is there another document I've missed? 29 seems so random, haha!
For now I will customize my deployment and set this to 29 to make clusterlint happy, but I'd like to get to the bottom of this. I've logged an issue against cert-manager as well. If clusterlint is correct, cert-manager should probably make the default value 29. On the other hand, if 30 is indeed a valid value, clusterlint probably shouldn't complain about it.
The text was updated successfully, but these errors were encountered: