The go template was concatenating all the mounts into one line. It
happened to work because the mount we are interested was always first.
Fix it to output one mount per line instead.
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container
Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.
To get round that the new boot process is:
1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container
The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.
To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.
Fixes: https://github.com/mrsked/mrsk/issues/402
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.
Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.
We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.
Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.
Hooks returning non-zero exit codes will abort the current command.
Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.
Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.
Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.
When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.
There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:
- Supports non-HTTP checks, and app-specific check scripts provided by a
container.
- When booting a container, allows MRSK to wait for a container to be
healthy before shutting down the old container it replaces. This
should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
reach a healthy state, which prevents any traffic from being routed to
them before they are ready.
The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.
We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.
We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.
We can then do the normal zero downtime run/wait/stop.
While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
* main:
Wording
Remove accessory images using tags rather than labels
Update readme to point to ghcr.io/mrsked/mrsk
Validate that all roles have hosts
Commander needn't accumulate configuration
Pull latest image tag, so we can identity it
Default to deploying the config version
Remove unneeded Dockerfile.dind, update Readme
add D-in-D dockerfile, update Readme
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
* main:
Add another assertion for `escape_shell_value`
Add tests for `Mrsk::Utils`
Fix indentation
Don't report exception here too
Don't report exception
Add CLI tests for remaining commands that are not tested yet
Minor: Properly require active_support