Move health checks into Docker

Replaces our current host-based HTTP healthchecks with Docker healthchecks, and adds a new `healthcheck.cmd` config option that can be used to define a custom health check command. Also removes Traefik's healthchecks, since they are no longer necessary. When deploying a container that has a healthcheck defined, we wait for it to report a healthy status before stopping the old container that it replaces. Containers that don't have a healthcheck defined continue to wait for `MRSK.config.readiness_delay`. There are some pros and cons to using Docker healthchecks rather than checking from the host. The main advantages are: - Supports non-HTTP checks, and app-specific check scripts provided by a container. - When booting a container, allows MRSK to wait for a container to be healthy before shutting down the old container it replaces. This should be safer than relying on a timeout. - Containers with healthchecks won't be active in Traefik until they reach a healthy state, which prevents any traffic from being routed to them before they are ready. The main _disadvantage_ is that containers are now required to provide some way to check their health. Our default check assumes that `curl` is available in the container which, while common, won't always be the case.
2023-04-07 11:14:51 +01:00
parent bc8875e020
commit df202d6ef4
13 changed files with 186 additions and 115 deletions
--- a/README.md
+++ b/README.md
@@ -677,9 +677,11 @@ That'll post a line like follows to a preconfigured chatbot in Basecamp:
 [My App] [dhh] Rolled back to version d264c4e92470ad1bd18590f04466787262f605de
 ```

-### Custom healthcheck
+### Healthcheck

-MRSK defaults to checking the health of your application again `/up` on port 3000 up to 7 times. You can tailor the behaviour with the `healthcheck` setting:
+MRSK uses Docker healtchecks to check the health of your application during deployment. Traefik uses this same healthcheck status to determine when a container is ready to receive traffic.
+
+The healthcheck defaults to testing the HTTP response to the path `/up` on port 3000, up to 7 times. You can tailor this behaviour with the `healthcheck` setting:

 ```yaml
 healthcheck:
@@ -690,7 +692,29 @@ healthcheck:

 This will ensure your application is configured with a traefik label for the healthcheck against `/healthz` and that the pre-deploy healthcheck that MRSK performs is done against the same path on port 4000.

-The healthcheck also allows for an optional `max_attempts` setting, which will attempt the healthcheck up to the specified number of times before failing the deploy. This is useful for applications that take a while to start up. The default is 7.
+You can also specify a custom healthcheck command, which is useful for non-HTTP services:
+
+```yaml
+healthcheck:
+  cmd: /bin/check_health
+```
+
+The top-level healthcheck configuration applies to all services that use
+Traefik, by default. You can also specialize the configuration at the role
+level:
+
+```yaml
+servers:
+  job:
+    hosts: ...
+    cmd: bin/jobs
+    healthcheck:
+      cmd: bin/check
+```
+
+The healthcheck allows for an optional `max_attempts` setting, which will attempt the healthcheck up to the specified number of times before failing the deploy. This is useful for applications that take a while to start up. The default is 7.
+
+Note that the HTTP health checks assume that the `curl` command is avilable inside the container. If that's not the case, use the healthcheck's `cmd` option to specify an alternative check that the container supports.

 ## Commands