Move health checks into Docker

Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
This commit is contained in:
Kevin McConnell
2023-04-07 11:14:51 +01:00
parent bc8875e020
commit df202d6ef4
13 changed files with 186 additions and 115 deletions

View File

@@ -0,0 +1,39 @@
class Mrsk::Utils::HealthcheckPoller
TRAEFIK_HEALTHY_DELAY = 1
class HealthcheckError < StandardError; end
class << self
def wait_for_healthy(pause_after_ready: false, &block)
attempt = 1
max_attempts = MRSK.config.healthcheck["max_attempts"]
begin
case status = block.call
when "healthy"
sleep TRAEFIK_HEALTHY_DELAY if pause_after_ready
when "running" # No health check configured
sleep MRSK.config.readiness_delay if pause_after_ready
else
raise HealthcheckError, "container not ready (#{status})"
end
rescue HealthcheckError => e
if attempt <= max_attempts
info "#{e.message}, retrying in #{attempt}s (attempt #{attempt}/#{max_attempts})..."
sleep attempt
attempt += 1
retry
else
raise
end
end
info "Container is healthy!"
end
private
def info(message)
SSHKit.config.output.info(message)
end
end
end