If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.
That is a problem for two reasons:
- Multiple containers share a common router while a deployment is
happening. At this point, the router configuration will be different
between the containers; Traefik flags this as an error, and stops
routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
define custom configuration on the service, we'll need to know what
it will be called.
Changed to force the service name by setting one of its properties.
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.
When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.
There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:
- Supports non-HTTP checks, and app-specific check scripts provided by a
container.
- When booting a container, allows MRSK to wait for a container to be
healthy before shutting down the old container it replaces. This
should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
reach a healthy state, which prevents any traffic from being routed to
them before they are ready.
The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.
We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
* main: (24 commits)
Bump version for 0.11.0
Labels can be added to Traefik
Make rollbacks role-aware
fix typo role to roles
Explained the latest modifications of Traefik container labels
Remove .idea folder
Updated README.md with new healthcheck.max_attempts option
Fix test case: console output message was not updated to display the current/total attempts
Require net-ssh ~> 7.0 for SHA-2 support
Improved deploy lock acquisition
Excess CR
Style
Simpler
Make it explicit, focus on Ubuntu
More explicit
Not that --bundle is a Rails 7+ option
Update README.md
Update README.md
Improved: configurable max_attempts for healthcheck
Traefik service name to be derived from role and destination
...
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.
We can then do the normal zero downtime run/wait/stop.
While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
* main:
Wording
Remove accessory images using tags rather than labels
Update readme to point to ghcr.io/mrsked/mrsk
Validate that all roles have hosts
Commander needn't accumulate configuration
Pull latest image tag, so we can identity it
Default to deploying the config version
Remove unneeded Dockerfile.dind, update Readme
add D-in-D dockerfile, update Readme
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
Because the container name is generated it isn't possible to
determine this inside the container.
This adds the MRSK_CONTAINER_NAME env var when running the
container so it can be read by the service running inside the
container.