Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry should be enabled by default #193

Open
bobh66 opened this issue May 25, 2023 · 2 comments
Open

Retry should be enabled by default #193

bobh66 opened this issue May 25, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@bobh66
Copy link
Contributor

bobh66 commented May 25, 2023

What happened?

One of Crossplane's best features is that it just keeps retrying when things fail, ever optimistic that something will change in the future to allow things to succeed "next time".

provider-helm is an exception to this - by default it does NOT retry, and there is no way to make it retry forever. rollbackLimit defaults to nil, which disables retry. And there is no way to set rollbackLimit to "forever", such as using 0 or -1. The only option is to set it to a "large" number which will eventually be exceeded.

Ideally the Release would behave like all other Managed Resources and continuously retry until it is successful.

It seems like to be consistent with other providers and Crossplane itself, rollbackLimit should default to a value that specifies infinite retries, or nil should indicate infinite retries and 0 should disable retries.

rollbackLimit is also not an obvious attribute name for an option that controls retries. Maybe a better solution would be to add a retryEnabled attribute that turns retries on and off, and leave rollbackLimit to specify the number of retries, where nil is no limit.

I'd be glad to push a PR if any of these suggestions are agreeable.

How can we reproduce it?

Deploy a Release

@bobh66 bobh66 added the bug Something isn't working label May 25, 2023
@turkenh
Copy link
Collaborator

turkenh commented May 26, 2023

FWIW, provider-helm indeed retries for other errors than a failing release. For example, it would retry forever if it hits an error while pulling the chart or communicating with the Kube API.

I would be supportive of implementing a negative value meaning retry forever, but hesitant to change the default behavior for everyone.

Also related: https://fluxcd.io/flux/components/helm/helmreleases/#configuring-failure-remediation

@bobh66
Copy link
Contributor Author

bobh66 commented Oct 4, 2024

I encountered this problem again today, so I'd really like to find a way to make it retry-by-default. How about a CLI parameter to the provider like --always-retry that makes an unset rollbackLimit equivalent to an infinite retry? I agree it's not ideal to change the behavior of an attribute via an optional parameter, but I think the default behavior should match all other providers and what kubernetes does by default - keep trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants