Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: active healthcheck increase the response latency #11756

Open
jujiale opened this issue Nov 19, 2024 · 4 comments
Open

bug: active healthcheck increase the response latency #11756

jujiale opened this issue Nov 19, 2024 · 4 comments
Labels
question label for questions asked by users

Comments

@jujiale
Copy link
Contributor

jujiale commented Nov 19, 2024

Description

Hello, APISIX team.
in our prd env, we suffered a very strange scenario.
we have a cluster which have 9 apisix instances.

the upstream is using service discovery, which is eureka.
our microservice is registerd in eureka, it deployed in k8s, use eureka exposed to apisix, the microservice has 150 instances.

we deployed it in July this year, but recently, we find that the latency is increase to 200ms+ (normal latency is about 100ms, but recently some latency is beyond 200ms), when we send a request directly to one of the microservice instance, it latency is about 100ms,but when the request proxy by apisix, it increased.

we opened active healthcheck(use tcp way), when we closed the healthcheck, the latency suddenly recoverd to about 100ms.

also we find the prometheus metrics may have some odd things. the following metrics is 0:
`

apisix_shared_dict_free_space_bytes{name="worker-events"} 0
apisix_shared_dict_free_space_bytes{name="prometheus-metrics"} 0

`
the shared_dict worker-events and prometheus-metrics both are 10m

I want to know if the above shared_dict used up could result in healthcheck increase the latency.

I tried reopen it in our dev env, but failed, the error log indicate the prometheus has no memory

I find a other issue metioned : but seems no activity. #11345

hope your answer, thanks.

Environment

  • APISIX version (run apisix version):2.15.3
  • Operating system (run uname -a):
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):3.5.0
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
@dosubot dosubot bot added the question label for questions asked by users label Nov 19, 2024
@jujiale
Copy link
Contributor Author

jujiale commented Nov 19, 2024

because we add other feature in our apisix, so we cannot update it to 3.x version.
our healthcheck version is v3.2.0 https://github.com/api7/lua-resty-healthcheck
and we merge some pr that apisix have fixed about healthcheck which version is beyond 2.15.3

@jujiale
Copy link
Contributor Author

jujiale commented Nov 20, 2024

in error.log. find that earlier pod ip exist in healthcheck.
2024/11/20 09:19:36 [warn] 16195#16195: *15291831195 [lua] healthcheck.lua:1383: log(): [healthcheck] (upstream#/xxx/upstreams/515481794896725698) healthy SUCCESS increment (10/2) for '10.98.xxx.155(10.98.xxx.155:8080)', context: ngx.timer, client: 172.xxx.29.xxx, server: 0.0.0.0:80
I confirmed that 10.98.xxx.155 is not our miroservice pod ip, it used to be our microservice pod ip, it now belongs to the other service. so if upstream config in etcd not change. but upstream node change(because use eureka discovery),it seems the healthcheck will not remove the earlier pod ip

@jujiale
Copy link
Contributor Author

jujiale commented Nov 20, 2024

we also find even if we close the healthcheck. use the tcpdump to capture packet, the apisix instance also acting active healthcheck

@jujiale jujiale changed the title help request: active healthcheck increase the response latency bug: active healthcheck increase the response latency Nov 20, 2024
@jujiale
Copy link
Contributor Author

jujiale commented Nov 20, 2024

I prefer to this is a bug problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question label for questions asked by users
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant