Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reloading Nginx on schedule with entrypoint script (think LetsEncrypt certs) #509

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

needleshaped
Copy link

@needleshaped needleshaped commented Feb 22, 2021

The script is a no-op by default, you would need to enable its logic by
setting an NGINX_ENTRYPOINT_RELOAD_EVERY_X_HOURS variable to integer/float value greater than 0 (argument to sleep)

The script then puts an infinite loop in backgound, reloading Nginx on a schedule.

Example use case:

  • Nginx with mounted Lets Encrypt certificates
    • e.g. single certbot, pushing certificates to NFS share + multiple Nginx consumers, that require reload every 3 months.

This PR solves following questions, e.g.:

Alternatives (or workarounds) would be:

  • cron at host
  • certbot remote hook to host
  • certbot + nginx in same container, standard certbot hook
  • inotify (see comment below)

Tested with:

    volumes:
      ...
      - ./40-reload-every-x-hours.sh:/docker-entrypoint.d/40-reload-every-x-hours.sh
    environment:
      - NGINX_ENTRYPOINT_RELOAD_EVERY_X_HOURS=0.1

Resulting in successful reload:

...
nginx      | 40-reload-every-x-hours.sh: Reloading Nginx every 0.1 hour(s)
...
nginx      | 40-reload-every-x-hours.sh: Reloading Nginx ...
nginx      | 2021/02/22 20:55:34 [notice] 49#49: signal process started

Value validation test:

  • NGINX_ENTRYPOINT_RELOAD_EVERY_X_HOURS=0
  • NGINX_ENTRYPOINT_RELOAD_EVERY_X_HOURS=-1.0
  • NGINX_ENTRYPOINT_RELOAD_EVERY_X_HOURS=one
nginx      | /docker-entrypoint.sh: Launching /docker-entrypoint.d/40-reload-every-x-hours.sh
nginx      | 40-reload-every-x-hours.sh: Provide integer or floating point number greater that 0. See 'man sleep'.

P.S. Kudos for idea goes to https://dev.to/koddr/how-to-dockerize-your-static-website-with-nginx-automatic-renew-ssl-for-domain-by-certbot-and-deploy-it-to-digitalocean-4cjc
P.P.S Followed example of 30-tune-worker-processes.sh commit 2b06409 by @thresheek

TODO! describe parameter in https://github.com/docker-library/docs/tree/master/nginx

@needleshaped
Copy link
Author

needleshaped commented Feb 23, 2021

P.S. After taking a second look... This solution is still not solid enough. Inotifyd, however, would serve the purpose perfectly.

Should I change this PR or create new (if reload functionality is still welcomed). Your opinion?

EDIT: In NFS use case inotify will NOT help, as NFS does not support it. There are workarounds, like client-server implementations of notification brokers, but it introduces intolerable complexity.

I made and tested code, but since that actually answers different scenarios (e.g. reloading on configuration change, or even running 20-envsubst-on-templates.sh on template change), it will be separate PR.

@thresheek
Copy link
Collaborator

thresheek commented Feb 24, 2021

That looks like a giant hack to be honest rather than a solution :-)

The problems with that approach is that we're introducing some poor-man process management into a container; while I generally agree there are use-cases (such as the one in this PR), it's a bit out of scope for this image. There is no way to check if the configuration was applied, and a mere fact of sending a signal is not enough to figure out the process state. Also, we're supposed to follow official guidelines and best practices and this doesnt exactly fall under the init section - so my feeling is that we shouldnt have this code in the image.

As for the actual implementation, I think that instead of reloading every X minutes (which means more and more long-lived workers are using more CPU/RAM if e.g. you have websockets), I would rather create a flag file in a certbot hook, and check for this file when waking up from sleep. Also, maybe it'll be enought redefine the CMD/command instead of adding another script to have this while loop active during the container lifetime?

@needleshaped
Copy link
Author

needleshaped commented Feb 25, 2021

Thank you, it is indeed poor-man process, an attempt to solves a use case that spawned a lot of wild workarounds with some standard approach (if PR would be accepted).

That aside, I came up with inotify alternative. Tested, works. Please tell me, if it's not-so giant hack ;) , and more of a solution (or not).

Using CMD or ENTRYPOINT is something I will test, as well as flag file (note, Nginx will then run under shell). But that will be you helping me with just another workaround for the use case. Next day, next guy will have to search for it again.

If we decide that "flag file" is better than below inotify approach, we can replace inotify with a sleep loop that monitors given paths instead, but would argue to at least stick to /docker-entrypoint.d/ to set recommended workaround in stone.

As a result of our conversation, I want a solution, that you and other maintainers would approve. I wonder, if it's possible at all, taking in account one concern per container from init section you were referring to... Unless Nginx supports that natively. .. Or we redefine, what concern is.

So, inotify approach:

  1. Re-uses /docker-entrypoint.d
  2. Reads list of paths to monitor (actually 1-to-1 inotifyd arguments, including script to run on event)
  3. Runs inotifyd for each path.
  4. Creates a standard basic script to reload Nginx.

Upside:

  • Not as dumb, as scheduled reload
  • Flexibility. In example below I can modify template >> triggers 20-envsubst-on-templates.sh >> updates default.conf >> triggers reload.
  • Offloads complexity from CI/CD pipelines. We give new config/certificates to Nginx, it runs it.
  • Immediate result. Nice for testing/development.
  • Optional. It's up to user to see if benefits outweight risks

Downsides:

  • Running process(es) (e.g. busybox in case of inotifyd, footprint is small)
  • As of now, supports only alpine
  • Does not support NFS

/docker-entrypoint.d/50-run-inotifyd.sh

#!/bin/sh

set -e

[ "${NGINX_ENTRYPOINT_RUN_INOTIFYD_ARGS:-}" ] || exit 0

ME=$(basename $0)

echo >&3 "$ME: Important. Do not edit monitored files with VI. By default, VI editor creates *new* files with same name, thus inotify will not work" 

create_inotify_default_scripts() {
  cat <<-'EOF' >>/inotify-default-reload-nginx.sh
#!/bin/sh
ME=$(basename $0)
echo >&3 "$ME: Reloading Nginx ..."
nginx -s reload
EOF
  chmod +x /inotify-*.sh
}

start_background_identifyds() {
  IFS="|"
  for args in $NGINX_ENTRYPOINT_RUN_INOTIFYD_ARGS
  do
    echo >&3 "$ME: Running in background: inotifyd $args"
    eval "inotifyd $args &"
  done

}

create_inotify_default_scripts
start_background_identifyds

docker-compose.yml

    volumes:
      - ./default.conf.template:/etc/nginx/templates/default.conf.template
      - /mnt/letsencrypt:/etc/letsencrypt:ro
      - ./50-run-inotifyd.sh:/docker-entrypoint.d/50-run-inotifyd.sh

    environment:
      - NGINX_HOST=${NGINX_HOST}
      - NGINX_ENTRYPOINT_RUN_INOTIFYD_ARGS=
        /inotify-default-reload-nginx.sh /etc/nginx/conf.d/default.conf:w|
        /docker-entrypoint.d/20-envsubst-on-templates.sh /etc/nginx/templates/default.conf.template:w|
        #/inotify-default-reload-nginx.sh /etc/letsencrypt/live/some.domain/:M # inotify does not work with NFS mounts

@thresheek
Copy link
Collaborator

thresheek commented Feb 26, 2021

Yeah, I think the "init-replacement-but-not-really" approach is something we will not accept in the image in any case. However, I see no problem with a good documentation (e.g. providing an entrypoint script in a gist, or another repository) a link to which we can put into https://hub.docker.com/_/nginx.

As you've said, not sure if inotify will work on weird filesystems or subsystems such as WSL2, or Docker on Mac, or virtualbox mounts (those are known to have issues with e.g. sendfile()), so my suggestion about creating files was merely a way to use the dumbest way possible to communicate through containers. Maybe inotify is good enough, though, I don't really have an opinion.

@needleshaped
Copy link
Author

That's what I thought. I'll come up with something we can document. Don't blame me, if it becomes popular! ;)

@Exagone313
Copy link
Contributor

You can execute nginx -s reload in your container after updating the certificate files.

With Docker/Podman, I don't see why you couldn't run a command in the nginx container(s) after the files are updated by the ACME client in a volume. If you are accessing from a distant machine, there is a tool called ssh (and if you don't trust the machine running the ACME client, you can pull the certificates from another trusted machine).

Your use-case of NFS where inotify doesn't work seems very specific, I'd rather upload copies of secrets using scp/rsync then ssh into the machines for reloading nginx, with an automated configuration manager such as salt, rather than using NFS for sharing secrets. I had several issues in production with unexpected disconnections due to network outage, which was not as easy to monitor as pushing files to several machines.

(I even considered the case of using Kubernetes and avoiding cert-manager, you can use shareProcessNamespace to communicate with nginx from a sidecar container.)

@JonasAlfredsson
Copy link

I noticed this thread a while ago, and since I agree that the reload task is out of scope for the main Nginx image I just wanted to notify this thread that I have been tinkering with the "nginx+certbot in one image" solution mentioned above.
Apologies if it derails the discussion, but I have not seen any progress for a while so I just thought I would throw this out here :)

@iain-henderson
Copy link
Contributor

It is a hack, but you can use the healthcheck to accomplish this:

# Write Checksums for all the potentially changing files
find "${CERTIFICATE_DIR:-/var/certificate}" -type f -exec sha256sum "{}" \; > /tmp/certificate.checksums.now && echo Wrote new certificate checksums;

# If there are old checksums, check against those and reload if things have changed
test -r /tmp/certificate.checksums && ! diff -s /tmp/certificate.checksums /tmp/certificate.checksums.now && nginx -s reload && echo Reloading nginx;

# Create the "old" checksums
mv /tmp/certificate.checksums.now /tmp/certificate.checksums && echo Updated certificate checksums;

# Actually healthcheck
curl --fail http://localhost || exit 1

Or something similar.

@oliv3r
Copy link

oliv3r commented Jun 19, 2023

It is a hack, but you can use the healthcheck to accomplish this:

# Write Checksums for all the potentially changing files
find "${CERTIFICATE_DIR:-/var/certificate}" -type f -exec sha256sum "{}" \; > /tmp/certificate.checksums.now && echo Wrote new certificate checksums;

# If there are old checksums, check against those and reload if things have changed
test -r /tmp/certificate.checksums && ! diff -s /tmp/certificate.checksums /tmp/certificate.checksums.now && nginx -s reload && echo Reloading nginx;

# Create the "old" checksums
mv /tmp/certificate.checksums.now /tmp/certificate.checksums && echo Updated certificate checksums;

# Actually healthcheck
curl --fail http://localhost || exit 1

Or something similar.

I'm doing the same hack :) however I think having the inotify package available, would already be a big boon. (see #791).

With regards to the init discussion made before by @thresheek here #509 (comment) I think dumb-init's readme (iirc tini says the same), they strongly 'urge' to use atleast tini/dumb-init as an init system, especially for something like nginx, with all these signals and multiple workers jumping around. And since the image by default already launches some commands (albeit before jumping to nginx), it's only fair to add an init solution (as I proposed in #791.

@needleshaped
Copy link
Author

needleshaped commented Jun 1, 2024

A small update with updated working script as a help for people searching for a workaround for given problem (e.g. various Nginx deployments that use centralized NFS share for LE certificates):

  • Dockerfile
ARG NGINX_VERSION
FROM nginx:${NGINX_VERSION}

...

COPY ./docker-entrypoint.d/*.sh /docker-entrypoint.d/
RUN chmod +x /docker-entrypoint.d/*.sh
  • 40-reload-every-x-hours.sh
#!/bin/sh

set -e

[ "${NGINX_ENTRYPOINT_NGINX_RELOAD_EVERY_X_HOURS:-}" ] || exit 0 # exit early if necessary variable is not given

entrypoint_log() {
    if [ -z "${NGINX_ENTRYPOINT_QUIET_LOGS:-}" ]; then
        echo "$@"
    fi
}

ME=$(basename $0)

if [ $(echo "$NGINX_ENTRYPOINT_NGINX_RELOAD_EVERY_X_HOURS > 0" | bc) == 0 ]; then
  entrypoint_log "$ME: Provide integer or floating point number greater that 0. See 'man sleep'." 
  exit 1
fi

start_background_reload() {
  entrypoint_log "$ME: Reloading Nginx every $NGINX_ENTRYPOINT_NGINX_RELOAD_EVERY_X_HOURS hour(s)"
  while :; do
    entrypoint_log "$ME: Sleeping ..."
    sleep ${NGINX_ENTRYPOINT_NGINX_RELOAD_EVERY_X_HOURS}h
    entrypoint_log "$ME: Reloading Nginx ..."
    nginx -s reload
  done &
}

start_background_reload

A quick test:

source .env
docker build --build-arg NGINX_VERSION=$NGINX_VERSION --tag mynginxtest . && docker run -e NGINX_ENTRYPOINT_NGINX_RELOAD_EVERY_X_HOURS=0.002 mynginxtest # 0.002h ~=7.2sec

@oliv3r
Copy link

oliv3r commented Jun 14, 2024

I have the following in my compose file:

healthcheck:
  test: reload_conf_healthcheck.sh
  interval: 30s

where the reload check is

#!/bin/sh
# SPDX-License-Identifier: AGPL-3.0-or-later

# Copyright (C) 2023 Olliver Schinagl <[email protected]>

set -eu
if [ -n "${DEBUG_TRACE_SH:-}" ] && \
   [ "${DEBUG_TRACE_SH:-}" != "${DEBUG_TRACE_SH#*"$(basename "${0}")"*}" ] || \
   [ "${DEBUG_TRACE_SH:-}" = 'all' ]; then
        set -x
fi

chkfile='/run/nginx_conf_reload.md5'
if [ -f "${chkfile:-}" ]; then
        chksumfile="$(cat "${chkfile}")"
fi

calculate_chksum()
{
        chksum=''
        for _conf in '/etc/nginx/nginx.conf' '/etc/nginx/conf.d/'* '/etc/nginx/certs/'*; do
                if [ -f "${_conf:-}" ]; then
                        chksum="${chksum:-}$(md5sum "${_conf}" | \
                                             cut -c '1-32')"
                fi
        done

        chksum="$(echo "${chksum:-}" | \
                  md5sum | \
                  cut -c '1-32')"
}

run_entrypoints()
{
        find '/docker-entrypoint.d/' -follow -type f -print | sort -V | \
        while read -r _ep; do
                if [ ! -x "${_ep:-}" ]; then
                        continue
                fi

                case "${_ep:-}" in
                *'.envsh')
                        . "${_ep}"
                        ;;
                *'.sh') 
                        "${_ep}"
                        ;;
                *)
                        ;;
                esac
        done
}

calculate_chksum
if [ "${chksum:-}" != "${chksumfile:-x}" ]; then
        run_entrypoints

        if ! nginx -t; then
                echo 'Invalid configuration files! Not reloading.'
                exit 1
        fi

        calculate_chksum
        if [ "${chksum:-}" != "${chksumfile:-y}" ]; then
                nginx -s reload
                echo "${chksum}" > "${chkfile}"
        fi
fi

if ! wget -q -O '/dev/null' 'http://127.0.0.1:80'; then
        exit 1
fi

exit 0

which in short goes over all config files, calculates the checksum of the whole shebang and stores it for later reference. If there's a diff, and we have a valid nginx config, we'll reload.

with

server {
        listen 127.0.0.1:80;
        listen [::1]:80;
        server_name  localhost;

        access_log /var/log/nginx/access.log main;

        location / {
            return 200;
        }
}

to ensure I have the entrypoint needed.

@compilenix
Copy link

compilenix commented Jun 15, 2024

@oliv3r I have a suggestion for the checksum part. You can use the output of nginx -T to not only test but also receive the complete nginx runtime config, this also covers all nginx conf includes. One thing it does not cover are the loaded certificates. For this you could think about parsing out the actual included certificates from the generated runtime config and checking the timestamp on them individually, such that you can only perform a reload when a certificate is due for a reload. Might also be a little complex, from a reliability point of view.

I came across this since I'm also looking for a solid solution for my own nginx container image.

@oliv3r
Copy link

oliv3r commented Jun 16, 2024

@oliv3r I have a suggestion for the checksum part. You can use the output of nginx -T to not only test but also receive the complete nginx runtime config, this also covers all nginx conf includes.

When I first wrote this script, nginx -T didn't exist :) but what's more, it won't cover all situations nicely, if my assumptions are correct that nginx -T only works if you have a valid config. If it produces proper results on invalid configs, I'd be curious at the very least.

To illustrate my concern, We have a valid nginx config, which we store the crc value of. Great. But now, behind the scenes the config is changed, expected behavior, but it is invalid. if nginx -T now fails to produce a proper config (and thus crc value), that's fine. However, if we then change the config again, still invalid, we need to have different crc value for this ruse to work.

But I'm definitely going to check this out, use the output of nginx -T | crc instead of going over all various combinations of configs. Far better I'd think

One thing it does not cover are the loaded certificates. For this you could think about parsing out the actual included certificates from the generated runtime config and checking the timestamp on them individually, such that you can only perform a reload when a certificate is due for a reload. Might also be a little complex, from a reliability point of view.

Yeah, In the end, we only care about whether the certificate has been changed. To go more complex, I'd hope that nginx would fix this internally. E.g. by having an inotify watch on the certificate, and parse the expiry date etc or something. Doing this with scripts (like this one) is already a clude imo. So the only failure case here could be that we replace an up-to-date working certificate, with an older expired certificate. Operator error for sure, but would be nice to catch this. In the end, the operator would just be confused why his certificate isn't loading. Besides, if all is automated (lets-encrypt) this rare usecase shouldn't even be possible. KISS :p

I came across this since I'm also looking for a solid solution for my own nginx container image.

I've been using this for 2 years now, and am quite happy with it. Your suggestion about nginx -T is worthy of exploration and I will post an update if it works.

@oliv3r
Copy link

oliv3r commented Jun 16, 2024

I've been using this for 2 years now, and am quite happy with it. Your suggestion about nginx -T is worthy of exploration and I will post an update if it works.

My first exploration is not positive. nginx -T will tell you there's an error, but subtle differences on that error produce the exact same output. e.g. I had added bla and bla; to the containers default.conf, and nginx does complain about bla being unknown, it left out the semi-colon, thus generating identical results. But maybe that's not so bad after all, nginx -T produces an error, but we can't pipeline it however (unless we accept bashism and set pipefail).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants