Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable timeouts on healthcheck calls #426

Merged
merged 2 commits into from
Sep 4, 2024
Merged

Conversation

peel
Copy link
Contributor

@peel peel commented Aug 13, 2024

Currently, health checks reside behind the same timeout settings as any other endpoint. We observed that when autoscaling under massive load, it is possible for collector to be taken down because of long health check responses.
Which previously did not happen. We therefore move health checks above the timeout middleware to return to previous behavior. Additionally, this allows us to set arbitrarily short (or long) response times for the regular endpoints when necessary.

We also update default health checks to match what we typically set by default.


Part of PDP-1408

@peel peel marked this pull request as draft August 13, 2024 13:55
@peel peel force-pushed the bug/healthchecks-timeout branch 2 times, most recently from 88d119b to 6a68947 Compare August 13, 2024 14:00
@peel peel marked this pull request as ready for review August 13, 2024 14:10
@peel peel force-pushed the bug/healthchecks-timeout branch 3 times, most recently from c404837 to 689a96f Compare August 21, 2024 15:47
@peel peel added this to the 3.3.0 milestone Aug 22, 2024
peel added 2 commits September 4, 2024 18:21
Currently, healthchecks reside behind the same timeout settings as any other
endpoint. We observed that when autoscaling under massive load, it is possible
for collector to be taken down because of long health check responses.
Which previously did not happen. We therefore move healthchecks above the
timeout middleware to return to previous behavior.
Additionally, this allows us to set arbitrarily short (or long) response times
for the regular endpoints when necessary.

---

Part of [PDP-1408].
The reference defaults should be less strict and match the settings we define
upstream.
@peel peel force-pushed the bug/healthchecks-timeout branch from 2149ca5 to 088873a Compare September 4, 2024 16:28
@peel peel merged commit 3be22e4 into develop Sep 4, 2024
3 checks passed
@peel peel deleted the bug/healthchecks-timeout branch November 21, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants