How do you healthcheck your containers?

Sips'@slrpnk.net · 25 days ago

How do you healthcheck your containers?

Zelaf@sopuli.xyz · edit-2 25 days ago

So I’m also using Beszel and Ntfy to track my systems because it’s lightweight and very very easy. Coming from having tried Grafana and Prometheus and different TSDBs I felt like I was way better off.

I’ve been following Beszels development closely because it was previously missing features like container monitoring and systemd monitoring which I’m very thankful for them having added recently and I use containers as my primary way of hosting all my applications. The “Healthy” or “Unhealthy” status is directly reported by Docker itself and not something Beszel monitors directly so it has to be configured, either by the configuration in the Dockerfile of the container image or afterwards using the healthcheck options when running a container.

As some other comments mentioned, some containers do come with a healthcheck built in which makes docker auto-configure and enable that healthcheck endpoint. Some containers don’t have a healthcheck built into the container build file and some have documentation for adding a healthcheck to the docker run command or compose file. Some examples are Beszel and Ntfy themselves.

For containers that do not have a healthcheck built into the build file it is either documented how to add it to the compose or you have to figure out a way to do it yourself. For docker images that are built using a more standard image like Alpine, Debian or others you usually have something like curl installed. If the service you are running has a webpage going you can use that. Some programs have a healthcheck command built into it that you can also use.

As an example, the postgresql program has a built in healthcheck command you can use of that’ll check if the database is ready. The easiest way to add it would be to do

    healthcheck:
      test: ["CMD", "pg_isready", "-U", "root",  "-d", "db_name"]
      interval: 30s
      retries: 5
      start_period: 60s

That’ll run the command inside the container pg_isready -U root -d db_name every 30 seconds but not before 60 seconds to get the container up and running. Options can be changed depending on the speed of the system.

Another example, for a container that has the curl program available inside it you can add something like

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/"]
      interval: 1m
      retries: 3

This will run curl -f http://localhost:3000/ every 1 minute. If either of the above examples would exit with an exit code higher than 0 Docker would report the container has unhealthy. Beszel will then read that data and report back that the container is not healthy. Some web apps have something along the line of a /health endpoint you can use the curl command with as well.

Unless the developer has spent some extra time on the healthchecks it is often just a basic way to see that the program inside the container is running. However, usually the container itself exits if the program it is running crashes or quits. So a healthcheck isn’t always necessary as the healthcheck will be that the container has abruptly stopped. This is why things like Uptime Kuma is something to consider running alongside Beszel because it can monitor when a web address or similar is down as well even if a container exits which as of now Beszel is still sadly lacking.

I would recommend you read up on the Docker Compose spec for healthchecks since with the other options you can also do things like timeouts and what not, combining that with whatever program you’re running with the healthcheck you can get very creative with it if you must.

My personal recommendation would be to sticking with Uptime Kuma regarding proper service availability healthchecks since it’ll be easier to configure and get an overview of things like slow load times of web pages and containers that have stopped while using Beszel to monitor performance and resource usage.

Sips'@slrpnk.net · 25 days ago

Thanks for this very in depth answer, learned a lot from this 🫶