-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random / Sporadic 502 gateway timeouts #4433
Comments
This means your app closed the connection.
These two could be related to networking issues (No route to host) or your pod died https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#proxy-next-upstream |
Also, keep in mind post requests are not retried unless you enable that https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#retry-non-idempotent |
Closing. Please reopen if the behavior persists after changing the settings. |
We're having identical issues (eks 1.13). Any reference is welcome.. |
@Timvissers - After seeing some other issues recently I stumbled upon this issue in k8s - kubernetes/kubernetes#74839 I think this is actually the root cause of this issue I was seeing here. Best bet is to upgrade to 1.15 if possible as some of the popular fixes will actually cause all communication to some nodes to start failing after a while. Hope this helps |
Thx @DP19 |
@Timvissers - Sorry they're linked in some other issues that are related. Here's the Docker libnetwork repo where they're discussing practically the same issue and there he has two work-arounds: In the blog post where they suggest adding a daemon set to run a startup script to add this line to conntrack "echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal" others in the PR for this issue in k8s are suggesting that this will actually "cause the contrack table to be full and all connectivity to the servers are lost". So we haven't implemented any fix for our environments yet. |
Just to say that my issue was aws specific: aws/amazon-vpc-cni-k8s#641 |
@DP19 We are experiencing the same situation on EKS but using 1.14. Also why it worked weeks ago for you and stopped working, was it un update ? Did you find any solution ? I'll try another ingress, but it seems to be network related. |
@miclefebvre - so its actually two issues for us. We were affected by the cni issue which was resolved with rolling back the cni driver. But we have a long standing issue while running 1.14 that will be resolved once we eks releases 1.15. I opened this issue once the cni plugin issue came up but it never really worked without any issue due to the kube-proxy bug in 1.14 described in the blog post in my previous comment |
@DP19 Thanks, I think my problem is a little bit different because what I receive is *14892883 connect() failed (111: Connection refused) while connecting to upstream Strange |
@miclefebvre please check your applications have liveness and readiness probes. |
@miclefebvre if you can't do that, you could use the annotation |
@aledbf It has probes and I did added the annotation but it's still failing. |
Note that the App loads, only some resources returns 502 and it's random and not always |
@miclefebvre - I would bypass the ingress using port forward and see if it still happens. This is a great place to start as well if you haven’t tried everything here and will help get to a root cause - https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/ |
@miclefebvre I suggest you check the ingress controller pod logs to check the retries (pods). Also, maybe your app is restarting? |
I have a repro steps but I don't know what can I do with it. Maybe it can helps someone. On EKS cluster with 2 nodes:
Note when I only delete the pods, the problem doesn't occur. |
Wanted to leave this here in case it helps anyone else. I was getting intermittent 502's (browser confusingly reported a CORS error), and eventually realized that my Kubernetes manifests were using a shared metadata selector label. Be sure these are unique for each service! |
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug Report
NGINX Ingress controller version:
0.25.0
Kubernetes version (use
kubectl version
):v1.12.10
Environment:
aws / eks
uname -a
): 4.14.106-97.85.amzn2.x86_64What happened:
We're seeing random and sporadic 502's being returned and unable to reliably reproduce.
What you expected to happen:
ingress should respond with a 200
How to reproduce it (as minimally and precisely as possible):
Unsure as it happens very sporadically
Anything else we need to know:
messages from ingress controllers:
"*2169 upstream prematurely closed connection while reading response header from upstream"
"*1360038 connect() failed (113: No route to host) while connecting to upstream"
"*1655177 upstream timed out (110: Connection timed out) while connecting to upstream"
This was working a week ago; now we're receiving these 502's from multiple deployments (some of which have not changed in over a month). We've checked the load on the upstream pods and they are handing traffic well and we can port-forward to them directly and not have any 502's or connection issues.
The text was updated successfully, but these errors were encountered: