Commit Graph

134 Commits

Author SHA1 Message Date
David Heinemeier Hansson
35b5b317af Merge branch 'main' into pr/205
* main:
  Simplify domain language to just "boot" and unscoped config keys
  Retain a fixed number of containers when pruning
  Don't assume rolling back in message
  Check all hosts before rolling back
  Ensure Traefik service name is consistent
  Extend traefik delay by 1 second
  Include traefik access logs
  Check if we are still getting a 404
  Also dump load balancer logs
  Dump traefik logs when app not booted
  Fix missing for apt-get
  Report on container health after failure
  Fix the integration test healthcheck
  Allow percentage-based rolling deployments
  Move `group_limit` & `group_wait` under `boot`
  Limit rolling deployment to boot operation
  Allow performing boot & start operations in groups
2023-05-02 14:29:06 +02:00
David Heinemeier Hansson
8854bb63a1 Merge pull request #254 from basecamp/retain-last-5-containers
Retain a fixed number of containers when pruning
2023-05-02 13:16:49 +02:00
Donal McBreen
971a91da15 Retain a fixed number of containers when pruning
Time based container and image retention can have variable space
requirements depending on how often we deploy.

- Only prune stopped containers, retaining the 5 newest
- Then prune dangling images so we only keep images for the retained
containers.
2023-05-02 10:15:08 +01:00
Kevin McConnell
a72f95f44d Ensure Traefik service name is consistent
If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.

That is a problem for two reasons:

- Multiple containers share a common router while a deployment is
  happening. At this point, the router configuration will be different
  between the containers; Traefik flags this as an error, and stops
  routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
  define custom configuration on the service, we'll need to know what
  it will be called.

Changed to force the service name by setting one of its properties.
2023-05-02 09:43:04 +01:00
Jeremy Daer
e85bd5ff63 Bootstrap: use multi-platform installer
* Limit auto-install to root users; otherwise, give manual install guidance
* Support non-Debian/Ubuntu with the multi-OS get.docker.com installer
2023-05-01 13:26:00 -07:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Kartikey Tanna
c59eb00dd0 Labels can be added to Traefik 2023-04-12 14:53:48 +05:30
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
David Heinemeier Hansson
fb1718ca6d Merge pull request #197 from tannakartikey/traefik_rules_with_destination
Traefik service name to be derived from role and destination
2023-04-10 15:11:07 +02:00
Kartikey Tanna
c60cc92dfe Traefik service name to be derived from role and destination 2023-04-09 13:44:57 +05:30
Jeremy Daer
bd8f13dd5e Traefik image config for version pinning, upgrades, and custom images
Accounts for the 2.9.10 security release and allows testing Traefik 3 betas.

* Use `image` to configure a specific Traefik Docker image.
* Default to `traefik:v2.9` to track future 2.9.x minor releases rather
  than tightly pinning to `v2.9.9`.
* Support images from the configured registry.

References #165
2023-04-07 14:15:25 -07:00
Kevin McConnell
2957388bf6 Pin Traefik to v2.9.9 2023-03-28 14:59:03 +01:00
Tobias Bühlmann
078d68b170 Push <image>:latest in addition to <image>:<git-ref> 2023-03-27 12:52:11 +02:00
Donal McBreen
05488e4c1e Zero downtime redeploys
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.

We can then do the normal zero downtime run/wait/stop.

While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
2023-03-24 17:09:20 +00:00
David Heinemeier Hansson
84540cee7b Merge branch 'main' into pr/154
* main: (32 commits)
  Inline default as with other options
  Symbols!
  Fix tests
  test stop with custom stop wait time
  No need to replicate Docker default
  Describe purpose rather than elements
  Style and ordering
  Customizable stop wait time
  Fix tests
  Ensure it also works when configuring just log options without setting a driver
  Add accessory test
  Undo change
  Improve test
  Update README
  Ensure default log option `max-size=10m`
  #142 Allow to customize container options in accessories
  Fix flaky test
  Fix tests
  More resilient tests
  Fix other tests
  ...
2023-03-24 15:43:17 +01:00
David Heinemeier Hansson
b5ccc1fa5d Merge branch 'main' into global-logging-config 2023-03-24 15:32:41 +01:00
David Heinemeier Hansson
4fa71834ad Symbols! 2023-03-24 15:27:11 +01:00
Samuel Sieg
4044abdde1 Fix tests 2023-03-24 15:25:29 +01:00
Samuel Sieg
bc64a07a95 Merge branch 'main' into global-logging-config 2023-03-24 15:24:06 +01:00
David Heinemeier Hansson
fdb2502216 test stop with custom stop wait time 2023-03-24 15:22:34 +01:00
Jacopo
9b43a6b23b Customizable stop wait time
Configurable via a global `stop_wait_time` option.
The default is `10` which matches Docker defaults.
2023-03-24 15:04:45 +01:00
Samuel Sieg
86e99fb079 Merge branch 'main' into global-logging-config 2023-03-24 14:40:27 +01:00
David Heinemeier Hansson
93423f2f20 Merge branch 'main' into pr/99
* main:
  Wording
  Remove accessory images using tags rather than labels
  Update readme to point to ghcr.io/mrsked/mrsk
  Validate that all roles have hosts
  Commander needn't accumulate configuration
  Pull latest image tag, so we can identity it
  Default to deploying the config version
  Remove unneeded Dockerfile.dind, update Readme
  add D-in-D dockerfile, update Readme
2023-03-24 14:26:31 +01:00
Donal McBreen
8d8f9f6ada Deploy locks
Add a deploy lock for commands that are unsafe to run concurrently.

The lock is taken by creating a `mrsk_lock` directory on the primary
host. Details of who took the lock are added to a details file in that
directory.

Additional CLI commands have been added to manual release and acquire
the lock and to check its status.

```
Commands:
  mrsk lock acquire -m, --message=MESSAGE  # Acquire the deploy lock
  mrsk lock help [COMMAND]                 # Describe subcommands or one specific subcommand
  mrsk lock release                        # Release the deploy lock
  mrsk lock status                         # Report lock status

Options:
  -v, [--verbose], [--no-verbose]                # Detailed logging
  -q, [--quiet], [--no-quiet]                    # Minimal logging
      [--version=VERSION]                        # Run commands against a specific app version
  -p, [--primary], [--no-primary]                # Run commands only on primary host instead of all
  -h, [--hosts=HOSTS]                            # Run commands on these hosts instead of all (separate by comma)
  -r, [--roles=ROLES]                            # Run commands on these roles instead of all (separate by comma)
  -c, [--config-file=CONFIG_FILE]                # Path to config file
                                                 # Default: config/deploy.yml
  -d, [--destination=DESTINATION]                # Specify destination to be used for config file (staging -> deploy.staging.yml)
  -B, [--skip-broadcast], [--no-skip-broadcast]  # Skip audit broadcasts
```

If we add support for running multiple deployments on a single server
we'll need to extend the locking to lock per deployment.
2023-03-24 12:28:08 +00:00
David Heinemeier Hansson
17e74910e4 Merge pull request #150 from basecamp/remove-accessory-image
Remove accessory images using tags rather than labels
2023-03-24 13:21:15 +01:00
Samuel Sieg
9c27ead21f Ensure it also works when configuring just log options without setting a driver 2023-03-24 09:38:02 +01:00
Samuel Sieg
c3de89bb59 Add accessory test 2023-03-24 09:19:13 +01:00
Samuel Sieg
7369be48ff Ensure default log option max-size=10m 2023-03-24 09:10:36 +01:00
Samuel Sieg
4670db7f6d Merge branch 'main' into global-logging-config 2023-03-24 08:35:43 +01:00
Jeremy Daer
e859a581ab Remove accessory images using tags rather than labels 2023-03-23 15:59:28 -07:00
Donal McBreen
1ed4a37da2 Pull latest image tag, so we can identity it
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
2023-03-23 14:39:32 +00:00
David Heinemeier Hansson
e7e3cd98eb Fix tests 2023-03-23 15:16:10 +01:00
David Heinemeier Hansson
a1fc00347b Merge branch 'main' into pr/99
* main:
  Ask for access token
  Style
  Style
  config.traefik is already nil safe
  Update README.md
  Bump dev deps and consolidate platform matches
  Deploys mention the released service@version
  Accessories aren't required to publish a port
  Accessories may be pulled from authenticated registries
  Polish destination config loading
  Allow arbitrary docker options for traefik
  Fixed typos
  Fixed readme
  Rebased on main
  Added volume configuration in response to issue coments
  Modified in response to PR comments
  Added the additional_ports configuration
2023-03-23 14:48:13 +01:00
David Heinemeier Hansson
bab8e42965 Merge pull request #151 from basecamp/portless-accessories
Accessories aren't required to publish a port
2023-03-23 14:32:58 +01:00
David Heinemeier Hansson
01d684746e Merge pull request #100 from stepbeekio/feature/multiple-traefik-entrypoints
Added the docker options override configuration for traefik
2023-03-23 14:28:40 +01:00
Jeremy Daer
c870e560c1 Accessories aren't required to publish a port
Allows for background accessories like schedulers that don't act
as typical network service dependencies and have no port to expose.
2023-03-23 00:10:30 -07:00
Jeremy Daer
04b1d5e49e Accessories may be pulled from authenticated registries 2023-03-22 23:48:22 -07:00
David Heinemeier Hansson
19104cafb4 Merge branch 'main' into role-awareness 2023-03-21 08:20:26 -04:00
Samuel Sieg
1bdfc217c4 Merge branch 'main' into global-logging-config 2023-03-21 13:20:12 +01:00
Samuel Sieg
b5372988f7 Add global logging configuration 2023-03-19 09:21:08 +01:00
Samuel Sieg
491777221f Fix destination label filter 2023-03-16 16:15:31 +01:00
Stephen van Beek
4c542930c5 Allow arbitrary docker options for traefik 2023-03-15 15:37:10 +00:00
David Heinemeier Hansson
cb824bdc42 Merge branch 'main' into role-awareness 2023-03-14 19:11:10 -04:00
Stephen van Beek
53046efad4 Rebased on main 2023-03-14 20:11:09 +00:00
Stephen van Beek
2db1bfde00 Added volume configuration in response to issue coments 2023-03-14 19:59:19 +00:00
Stephen van Beek
2cea12c56b Modified in response to PR comments 2023-03-14 19:59:19 +00:00