Commit Graph

250 Commits

Author SHA1 Message Date
Donal McBreen
cd668066ff Get lock status by executing directly
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.

Fixes https://github.com/mrsked/mrsk/issues/239
2023-04-25 16:57:02 +01:00
Donal McBreen
52ca5b846a Wait for healthy containers in integration test
Rather than waiting 5 seconds and hoping for the best after we boot
docker compose, add docker healthchecks and wait for all the containers
to be healthy.
2023-04-25 15:41:25 +01:00
Kevin McConnell
f055766918 Allow percentage-based rolling deployments 2023-04-14 12:46:14 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Kevin McConnell
f530009a6e Allow performing boot & start operations in groups
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.

We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
2023-04-13 15:58:27 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Donal McBreen
43f7409de0 Make rollbacks role-aware
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.

We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
2023-04-12 09:59:39 +01:00
David Heinemeier Hansson
daa0c9b5be Merge pull request #196 from handy-la/main
Configurable max_attempts for healthcheck
2023-04-11 14:17:17 +02:00
Jacopo
03d933d10b Add Role to the message 2023-04-11 10:59:25 +02:00
Jacopo
579b4cd9aa Simplify
By using and ad-hoc command to detect and stop stale containers.
By default stale containers are only detected.
2023-04-11 10:22:03 +02:00
Jacopo
8ae5331d97 Boot stop all the old containers 2023-04-11 08:53:33 +02:00
Jacopo
4d47fbdf41 Merge stop and stop_stale_containers 2023-04-11 08:53:33 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
Arturo Ojeda
514b2aa243 Fix test case: console output message was not updated to display the current/total attempts 2023-04-10 09:29:19 -06:00
Donal McBreen
c4df440c79 Improved deploy lock acquisition
1. Don't raise lock error for non-lock issues during lock acquire
  (see https://github.com/mrsked/mrsk/pull/181)
2. If there is an error while the lock is held, don't release the lock
  and send a warning to stderr
2023-04-10 15:23:00 +01:00
Jeremy Daer
bd8f13dd5e Traefik image config for version pinning, upgrades, and custom images
Accounts for the 2.9.10 security release and allows testing Traefik 3 betas.

* Use `image` to configure a specific Traefik Docker image.
* Default to `traefik:v2.9` to track future 2.9.x minor releases rather
  than tightly pinning to `v2.9.9`.
* Support images from the configured registry.

References #165
2023-04-07 14:15:25 -07:00
David Heinemeier Hansson
1f83b5f6be Fix failure to pass on class options to subcommands 2023-03-28 18:04:16 +02:00
Kevin McConnell
2957388bf6 Pin Traefik to v2.9.9 2023-03-28 14:59:03 +01:00
David Heinemeier Hansson
7f178101f7 Merge pull request #164 from basecamp/accessory-hosts-or-roles
Run accessories on multiple hosts or roles
2023-03-28 14:31:24 +02:00
Donal McBreen
c06585fef4 Daemon/host/role accessories
Allow the hosts for accessories to be specified by host or role, or on
all app hosts by setting `daemon: true`.

```
  # Single host
  mysql:
    host: 1.1.1.1
  # Multiple hosts
  redis:
    hosts:
      - 1.1.1.1
      - 1.1.1.2
  # By role
  monitoring:
    roles:
      - web
      - jobs
```
2023-03-28 13:26:27 +01:00
milk1000cc
03614bfb79 Rolify app logs cli/command 2023-03-27 23:08:46 +09:00
Donal McBreen
05488e4c1e Zero downtime redeploys
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.

We can then do the normal zero downtime run/wait/stop.

While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
2023-03-24 17:09:20 +00:00
David Heinemeier Hansson
84540cee7b Merge branch 'main' into pr/154
* main: (32 commits)
  Inline default as with other options
  Symbols!
  Fix tests
  test stop with custom stop wait time
  No need to replicate Docker default
  Describe purpose rather than elements
  Style and ordering
  Customizable stop wait time
  Fix tests
  Ensure it also works when configuring just log options without setting a driver
  Add accessory test
  Undo change
  Improve test
  Update README
  Ensure default log option `max-size=10m`
  #142 Allow to customize container options in accessories
  Fix flaky test
  Fix tests
  More resilient tests
  Fix other tests
  ...
2023-03-24 15:43:17 +01:00
Samuel Sieg
bc64a07a95 Merge branch 'main' into global-logging-config 2023-03-24 15:24:06 +01:00
Samuel Sieg
86e99fb079 Merge branch 'main' into global-logging-config 2023-03-24 14:40:27 +01:00
David Heinemeier Hansson
494e29d672 Fix tests 2023-03-24 14:35:17 +01:00
David Heinemeier Hansson
93423f2f20 Merge branch 'main' into pr/99
* main:
  Wording
  Remove accessory images using tags rather than labels
  Update readme to point to ghcr.io/mrsked/mrsk
  Validate that all roles have hosts
  Commander needn't accumulate configuration
  Pull latest image tag, so we can identity it
  Default to deploying the config version
  Remove unneeded Dockerfile.dind, update Readme
  add D-in-D dockerfile, update Readme
2023-03-24 14:26:31 +01:00
Donal McBreen
8d8f9f6ada Deploy locks
Add a deploy lock for commands that are unsafe to run concurrently.

The lock is taken by creating a `mrsk_lock` directory on the primary
host. Details of who took the lock are added to a details file in that
directory.

Additional CLI commands have been added to manual release and acquire
the lock and to check its status.

```
Commands:
  mrsk lock acquire -m, --message=MESSAGE  # Acquire the deploy lock
  mrsk lock help [COMMAND]                 # Describe subcommands or one specific subcommand
  mrsk lock release                        # Release the deploy lock
  mrsk lock status                         # Report lock status

Options:
  -v, [--verbose], [--no-verbose]                # Detailed logging
  -q, [--quiet], [--no-quiet]                    # Minimal logging
      [--version=VERSION]                        # Run commands against a specific app version
  -p, [--primary], [--no-primary]                # Run commands only on primary host instead of all
  -h, [--hosts=HOSTS]                            # Run commands on these hosts instead of all (separate by comma)
  -r, [--roles=ROLES]                            # Run commands on these roles instead of all (separate by comma)
  -c, [--config-file=CONFIG_FILE]                # Path to config file
                                                 # Default: config/deploy.yml
  -d, [--destination=DESTINATION]                # Specify destination to be used for config file (staging -> deploy.staging.yml)
  -B, [--skip-broadcast], [--no-skip-broadcast]  # Skip audit broadcasts
```

If we add support for running multiple deployments on a single server
we'll need to extend the locking to lock per deployment.
2023-03-24 12:28:08 +00:00
David Heinemeier Hansson
17e74910e4 Merge pull request #150 from basecamp/remove-accessory-image
Remove accessory images using tags rather than labels
2023-03-24 13:21:15 +01:00
David Heinemeier Hansson
c89b77127b Merge pull request #143 from djmb/default-to-deploying-config-version
Default to deploying the config version
2023-03-24 12:36:20 +01:00
Samuel Sieg
7369be48ff Ensure default log option max-size=10m 2023-03-24 09:10:36 +01:00
Samuel Sieg
4670db7f6d Merge branch 'main' into global-logging-config 2023-03-24 08:35:43 +01:00
Jeremy Daer
e859a581ab Remove accessory images using tags rather than labels 2023-03-23 15:59:28 -07:00
Jeremy Daer
1887a6518e Commander needn't accumulate configuration
Commander had version/destination solely to incrementally accumulate CLI
options. Simpler to configure in one shot.

Clarifies responsibility and lets us introduce things like
`abbreviated_version` in one spot - Configuration.
2023-03-23 08:57:32 -07:00
Donal McBreen
1ed4a37da2 Pull latest image tag, so we can identity it
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
2023-03-23 14:39:32 +00:00
David Heinemeier Hansson
7e1596e722 Fix flaky test 2023-03-23 15:36:02 +01:00
David Heinemeier Hansson
a1fc00347b Merge branch 'main' into pr/99
* main:
  Ask for access token
  Style
  Style
  config.traefik is already nil safe
  Update README.md
  Bump dev deps and consolidate platform matches
  Deploys mention the released service@version
  Accessories aren't required to publish a port
  Accessories may be pulled from authenticated registries
  Polish destination config loading
  Allow arbitrary docker options for traefik
  Fixed typos
  Fixed readme
  Rebased on main
  Added volume configuration in response to issue coments
  Modified in response to PR comments
  Added the additional_ports configuration
2023-03-23 14:48:13 +01:00
David Heinemeier Hansson
65b90dd5c8 Merge branch 'main' into default-to-deploying-config-version 2023-03-23 14:42:31 +01:00
David Heinemeier Hansson
9648721ce7 Merge pull request #146 from basecamp/tell-me-more
Deploys mention the service and version
2023-03-23 14:38:31 +01:00
Jeremy Daer
53d7f9d528 Deploys mention the released service@version
Less work for broadcast commands to take on.

Also fixes a bug where rollback on hosts without a running container
would stop the container they had just started.
2023-03-23 01:09:25 -07:00
Jeremy Daer
04b1d5e49e Accessories may be pulled from authenticated registries 2023-03-22 23:48:22 -07:00
Donal McBreen
fb3353084f Default to deploying the config version
If we don't supply a version when deploying we'll use the result of
docker image ls to decide which image to boot. But that doesn't
necessarily correspond to the one we have just built.

E.g. if you do something like:

```
mrsk deploy        # deploys git sha AAAAAAAAAAAAAA
git commit --amend # update the commit message
mrsk deploy        # deploys git sha BBBBBBBBBBBBBB
```

In this case running `docker image ls` will give you the same image
twice (because the contents are identical) with tags for both SHAs but
the image we have just built will not be returned first. Maybe the order
is random, but it always seems to come second as far as I have seen.

i.e you'll get something like:

```
REPOSITORY    TAG              IMAGE ID       CREATED          SIZE
foo/bar       AAAAAAAAAAAAAA   6272349a9619   31 minutes ago   791MB
foo/bar       BBBBBBBBBBBBBB   6272349a9619   31 minutes ago   791MB
```

Since we already know what version we want to deploy from the config,
let's just pass that through.
2023-03-22 16:14:50 +00:00
David Heinemeier Hansson
60faf27a05 More resilient tests 2023-03-20 17:40:36 +01:00
David Heinemeier Hansson
43d1ecc94b Fix other tests 2023-03-20 17:33:13 +01:00
David Heinemeier Hansson
00b970323b Merge branch 'main' into pr/99
* main:
  Add another assertion for `escape_shell_value`
  Add tests for `Mrsk::Utils`
  Fix indentation
  Don't report exception here too
  Don't report exception
  Add CLI tests for remaining commands that are not tested yet
  Minor: Properly require active_support
2023-03-20 17:31:50 +01:00
Samuel Sieg
b5372988f7 Add global logging configuration 2023-03-19 09:21:08 +01:00
Samuel Sieg
dae8b14469 Fix indentation 2023-03-16 08:35:12 +01:00