You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KrakenD Version: 2.4.3
Go Version: 1.20.6
Glibc Version: MUSL-1.2.4_(alpine-3.18.2)
Is your feature request related to a problem? Please describe.
Restarting krakend always comes with a short downtime on that machine, as the old process is shutting down, thus closing the HTTP listen socket, and then a new process is starting up, doing some initializing and only then starts listening. Usually, high availability for an API gateway is desired.
Describe the solution you'd like
Implement graceful restart via cloudflare/tableflip. The restart works like so:
Send signal to current (old) krakend process (or use any other kind of notifying the process to restart)
Old krakend process spawns a new krakend process and passes its HTTP listen socket as a file descriptor to the new process. The old process is still running and serving HTTP requests.
New process is starting up, doing some initialization. Finally, it uses the listen socket passed as a file descriptor, to start serving HTTP requests
For a very small period of time, both processes are now serving requests
The new process signals the old process that it has finished initialization and is ready to serve requests
The old process shuts down.
If the new process fails during initialization, such as panicking due to an invalid config file, or exceeding a configurable startup timeout, the old process won't shut down and still serves requests. Therefore, it's ensured that at any time, there is a usable krakend process running.
This graceful restart strategy is in fact inspired by nginx reloads, see Cloudflare's blogpost.
Describe alternatives you've considered
The documentation recommends using blue/green deployments. While this can be straightforward in a Kubernetes or Cloud setup, it might not be usable in all situations. Having a simple builtin graceful restart functionality, just like nginx, makes it possible to update the configuration with zero downtime and without changing anything in the server infrastructure. I would consider this as an alternative restart option, so we have different options that are more or less suited for different setups.
The text was updated successfully, but these errors were encountered:
This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.
This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.
Version of KrakenD you are using
Is your feature request related to a problem? Please describe.
Restarting krakend always comes with a short downtime on that machine, as the old process is shutting down, thus closing the HTTP listen socket, and then a new process is starting up, doing some initializing and only then starts listening. Usually, high availability for an API gateway is desired.
Describe the solution you'd like
Implement graceful restart via cloudflare/tableflip. The restart works like so:
If the new process fails during initialization, such as panicking due to an invalid config file, or exceeding a configurable startup timeout, the old process won't shut down and still serves requests. Therefore, it's ensured that at any time, there is a usable krakend process running.
This graceful restart strategy is in fact inspired by nginx reloads, see Cloudflare's blogpost.
Describe alternatives you've considered
The documentation recommends using blue/green deployments. While this can be straightforward in a Kubernetes or Cloud setup, it might not be usable in all situations. Having a simple builtin graceful restart functionality, just like nginx, makes it possible to update the configuration with zero downtime and without changing anything in the server infrastructure. I would consider this as an alternative restart option, so we have different options that are more or less suited for different setups.
The text was updated successfully, but these errors were encountered: