Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Check #384

Closed
vbaidak opened this issue Mar 8, 2024 · 8 comments
Closed

Health Check #384

vbaidak opened this issue Mar 8, 2024 · 8 comments
Labels
support Support Requests

Comments

@vbaidak
Copy link

vbaidak commented Mar 8, 2024

Hi Team,

Is there any health check command?

@m90
Copy link
Member

m90 commented Mar 8, 2024

What's the failure scenario you would like to check for with such a health check? Right now, as long as the container did not exit, things are assumed running as configured (which obviously might not be true).

However, if this image would implement a health check, I wouldn't even know which conditions to check for. Do you have something specific in mind?

@m90 m90 added the support Support Requests label Mar 8, 2024
@vbaidak
Copy link
Author

vbaidak commented Mar 8, 2024

Yeah, that's why I'm asking. I'm ok with just having unless-stopped condition and asking only for the case if there is any process inside a container that is running and could potentially fail without container fail, so that we are not getting backups and do not know about this

@m90
Copy link
Member

m90 commented Mar 9, 2024

The process in the container works kind of like crond and is designed to never ever fail and instead just run the scheduled jobs, no matter if these invocations fail or not. This means the container being up does not tell you anything about whether backups have successfully been performed or not, the only guarantee you get is that it has been tried. I.e. you could have a container that is up for over a year but hasn't performed a single backup as you have accidentally provided bad credentials for your storage backend.

If you want job-level visibility instead, you can either monitor your logs or use the notification feature, which can notify you about either all runs or just failed ones. Docs are here https://offen.github.io/docker-volume-backup/how-tos/set-up-notifications.html

There are also ideas about exposing metrics via HTTP (e.g. Prometheus), but I don't know how / if this will ever happen: #64

@vbaidak
Copy link
Author

vbaidak commented Mar 9, 2024

Thanks for the explanation!
Well, I've already configured email notifications and they work well, so my backup failures will be covered.
What about the process itself? Maybe we can at least check if it is running via ps or does it have PID 1?

@m90
Copy link
Member

m90 commented Mar 9, 2024

What about the process itself?

This should be the responsibility of Docker or whatever tool runs the container in that case. If the process exits, the container does so as well, and in this case whoever is supervising the container should be taking care of whether it wants to restart the container or not.

@vbaidak
Copy link
Author

vbaidak commented Mar 9, 2024

I see

@vbaidak vbaidak closed this as completed Mar 9, 2024
@huyz
Copy link

huyz commented Dec 14, 2024

Maybe a healthcheck would be useful for this scenario:

Occasionally, when I trigger backup, I get this:

time=2024-12-14T07:12:24.192Z level=ERROR msg="Fatal error running command: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" error="main.(*command).runAsCommand: error running script: main.runScript.func4: error running script: main.runScript.func4.1.(*script).withLabeledCommands.2: error running archive-pre commands: main.(*script).runLabeledCommands: error querying for containers: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

I don't know why this is happening on my Docker Desktop for Windows, when my other services are running fine.

Once, I restart the container, all is good again.

I do use NOTIFICATION_URLS in conjunction with healthchecks.io, so I am notified when backups don't work.
But if there were a healthcheck, then my cron job, which restarts unhealthy containers, could automatically resolve the issue without a manual intervention from me.

@m90
Copy link
Member

m90 commented Dec 14, 2024

Currently there's a whole class of errors that aren't caught by the notification feature (bascially everything that goes wrong before the notification feature is initialized). I would think rectifying this would be the proper way to solve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Support Requests
Projects
None yet
Development

No branches or pull requests

3 participants