So recently been spending time configuring my selfhosted services with notifications usint ntfy. I’ve added ntfy to report status on containers and my system using Beszel. However, only 12 out of my 44 containers seem to have healthcheck “enabled” or built in as a feature. So im now wondering what is considered best practice for monitoring the uptime/health of my containers. I am already using uptimekuma, with the “docker container” option for each of my containers i deem necessary to monitor, i do not monitor all 44 of them 😅

So I’m left with these questions;

  1. How do you notify yourself about the status of a container?
  2. Is there a “quick” way to know if a container has healthcheck as a feature.
  3. Does healthcheck feature simply depend on the developer of each app, or the person building the container?
  4. Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

Thanks for any input!

  • realitaetsverlust@piefed.zip
    link
    fedilink
    English
    arrow-up
    21
    ·
    edit-2
    21 hours ago

    How do you notify yourself about the status of a container?

    I usually notice if a container or application is down because that usually results in something in my house not working. Sounds stupid, but I’m not hosting a hyper available cluster at home.

    Is there a “quick” way to know if a container has healthcheck as a feature.

    Check the documentation

    Does healthcheck feature simply depend on the developer of each app, or the person building the container?

    If the developer adds a healthcheck feature, you should use that. If there is none, you can always build one yourself. If it’s a web app, a simple HTTP request does the trick, just validate the returned HTML - if the status code is 200 and the output contains a certain string, it seems to be up. If it’s not a web app, like a database, a simple SELECT 1 on the database could tell you if it’s reachable or not.

    Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

    If you only run a bunch of web services that you use on demand, monitoring the HTTP requests to each service is more than enough. Caddy being a single point of failure is not a problem because your caddy being dead still results in the service being unusable. And you will immediately know if caddy died or the service behind it because the error message looks different. If the upstream is dead, caddy returns a 502, if caddy is dead, you’ll get a “Connection timed out”

    • Sips'@slrpnk.netOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      12 hours ago

      Yeah fair enough this, personally want to monitor backend services too just for good measure. Also to prove to my friends and family that i can maintain a higher uptime % than cloudflare 🤣

      • mmmac@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        5 hours ago

        If you’re looking for this you can use something like uptime kuma, which pings each service and looks for a specific response or it will ping you

        I doubled down recently and now have Grafana dashboards + alerts for all of my proxmox hosts, their containers etc.

        Alerts are mainly mean CPU, memory or disk utilization > 80% over 5 minutes

        I also get all of my notifications via a self hosted ntfy instance :~)

        • Sips'@slrpnk.netOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 hours ago

          As i wrote in my post, im already using uptimekuma to monitor my services. However if i choose the “docker container” mode foe uptimekuma to monitor it cant actually so that, as there is no health feature in most containers, so this results in 100% downtime 🙃 Other way would to do it would to just check the url of the service whoch ofc works too, but its not a “true” health check.

    • lps2@lemmy.ml
      link
      fedilink
      English
      arrow-up
      8
      ·
      21 hours ago

      For databases, many like postgres have a ping / ready command you can use to ensure it’s up and not have the overhead of an actual query! Redis is the same way (I feel like pg and redis health checks covers a lot of the common stack patterns)