-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provider closed 32 leases after one of K8s nodes experience a network error #15
Comments
It is the third time he is losing the leases, and the reasons are a failing k8s node (or when the worker node is down). It looks like when the cluster does not have enough room (resources) for the apps to start on the other available nodes, which then triggers the
|
provider logs (after stripping off the deployment hostnames These logs do have some messages such as I've asked the provider whether he can find older logs Update: unfortunately, the provider didn't save the logs before restarting the akash-provider pod. |
It's the 4th time d3akash is losing its leases: |
That's OK to have the provider close the leases when provider isn't charging for the lease that doesn't work for quite some time. So I think the Alternative proposal (client-defined) would be ideal if the timeout (the amount of time when the lease is down because it cannot redeploy as the worker node is down) could be configured by the clients themselves in their SDL (say I'll close this issue in the favor of the Alternative proposal. |
One of d3akash.cloud provider's K8s cluster nodes had a network error which caused it to close 32 leases.
I think the root cause is the #14 ; waiting for the provider logs from the provider owner to confirm.
provider address:
akash1u5cdg7k3gl43mukca4aeultuz8x2j68mgwn28e
heights 9636347 ... 9636379 (
MsgCloseBid
) issued to 32 leasesleases before it closed them (height
9636346
)This is the height before provider started closing
32
leases.This is mainly to verify the
withdrawn
vsconsumed
to rule out the case where the provider could have been running without withdrawing the leases (e.g. due to some bug / misconfig) for some time until it was restarted.leases after the drop (height
9636379
)The text was updated successfully, but these errors were encountered: