Time based container and image retention can have variable space
requirements depending on how often we deploy.
- Only prune stopped containers, retaining the 5 newest
- Then prune dangling images so we only keep images for the retained
containers.
Hosts could end up out of sync with each other if prune commands are run
manually or when new hosts are added.
Before rolling back confirm that the required container is available on
all hosts and roles.
Add checks for:
* Docker installed locally
* Docker buildx plugin installed locally
* Dockerfile exists
If checks fail, it will halt deployment and provide more specific error messages.
Also adds a cli subcommand:
`mrsk build dependencies`
Fixes: #109 and #237
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.
Fixes https://github.com/mrsked/mrsk/issues/239
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.
When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.
There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:
- Supports non-HTTP checks, and app-specific check scripts provided by a
container.
- When booting a container, allows MRSK to wait for a container to be
healthy before shutting down the old container it replaces. This
should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
reach a healthy state, which prevents any traffic from being routed to
them before they are ready.
The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.
We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.
We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
* main: (24 commits)
Bump version for 0.11.0
Labels can be added to Traefik
Make rollbacks role-aware
fix typo role to roles
Explained the latest modifications of Traefik container labels
Remove .idea folder
Updated README.md with new healthcheck.max_attempts option
Fix test case: console output message was not updated to display the current/total attempts
Require net-ssh ~> 7.0 for SHA-2 support
Improved deploy lock acquisition
Excess CR
Style
Simpler
Make it explicit, focus on Ubuntu
More explicit
Not that --bundle is a Rails 7+ option
Update README.md
Update README.md
Improved: configurable max_attempts for healthcheck
Traefik service name to be derived from role and destination
...
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.
We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
1. Don't raise lock error for non-lock issues during lock acquire
(see https://github.com/mrsked/mrsk/pull/181)
2. If there is an error while the lock is held, don't release the lock
and send a warning to stderr
Accounts for the 2.9.10 security release and allows testing Traefik 3 betas.
* Use `image` to configure a specific Traefik Docker image.
* Default to `traefik:v2.9` to track future 2.9.x minor releases rather
than tightly pinning to `v2.9.9`.
* Support images from the configured registry.
References #165
* `-e [REDACTED]` → `-e SOME_SECRET=[REDACTED]`
* Replaces `Utils.redact` with `Utils.sensitive` to clarify that we're
indicating redactability, not actually performing redaction.
* Redacts from YAML output, including `mrsk config` (fixes#96)
Allow the hosts for accessories to be specified by host or role, or on
all app hosts by setting `daemon: true`.
```
# Single host
mysql:
host: 1.1.1.1
# Multiple hosts
redis:
hosts:
- 1.1.1.1
- 1.1.1.2
# By role
monitoring:
roles:
- web
- jobs
```