Commit Graph

60 Commits

Author SHA1 Message Date
Kevin McConnell
a72f95f44d Ensure Traefik service name is consistent
If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.

That is a problem for two reasons:

- Multiple containers share a common router while a deployment is
  happening. At this point, the router configuration will be different
  between the containers; Traefik flags this as an error, and stops
  routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
  define custom configuration on the service, we'll need to know what
  it will be called.

Changed to force the service name by setting one of its properties.
2023-05-02 09:43:04 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
Kartikey Tanna
c60cc92dfe Traefik service name to be derived from role and destination 2023-04-09 13:44:57 +05:30
Donal McBreen
05488e4c1e Zero downtime redeploys
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.

We can then do the normal zero downtime run/wait/stop.

While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
2023-03-24 17:09:20 +00:00
David Heinemeier Hansson
b5ccc1fa5d Merge branch 'main' into global-logging-config 2023-03-24 15:32:41 +01:00
David Heinemeier Hansson
4fa71834ad Symbols! 2023-03-24 15:27:11 +01:00
Samuel Sieg
4044abdde1 Fix tests 2023-03-24 15:25:29 +01:00
Samuel Sieg
bc64a07a95 Merge branch 'main' into global-logging-config 2023-03-24 15:24:06 +01:00
David Heinemeier Hansson
fdb2502216 test stop with custom stop wait time 2023-03-24 15:22:34 +01:00
Jacopo
9b43a6b23b Customizable stop wait time
Configurable via a global `stop_wait_time` option.
The default is `10` which matches Docker defaults.
2023-03-24 15:04:45 +01:00
Samuel Sieg
86e99fb079 Merge branch 'main' into global-logging-config 2023-03-24 14:40:27 +01:00
David Heinemeier Hansson
93423f2f20 Merge branch 'main' into pr/99
* main:
  Wording
  Remove accessory images using tags rather than labels
  Update readme to point to ghcr.io/mrsked/mrsk
  Validate that all roles have hosts
  Commander needn't accumulate configuration
  Pull latest image tag, so we can identity it
  Default to deploying the config version
  Remove unneeded Dockerfile.dind, update Readme
  add D-in-D dockerfile, update Readme
2023-03-24 14:26:31 +01:00
Samuel Sieg
9c27ead21f Ensure it also works when configuring just log options without setting a driver 2023-03-24 09:38:02 +01:00
Samuel Sieg
7369be48ff Ensure default log option max-size=10m 2023-03-24 09:10:36 +01:00
Donal McBreen
1ed4a37da2 Pull latest image tag, so we can identity it
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
2023-03-23 14:39:32 +00:00
David Heinemeier Hansson
e7e3cd98eb Fix tests 2023-03-23 15:16:10 +01:00
David Heinemeier Hansson
19104cafb4 Merge branch 'main' into role-awareness 2023-03-21 08:20:26 -04:00
Samuel Sieg
1bdfc217c4 Merge branch 'main' into global-logging-config 2023-03-21 13:20:12 +01:00
Samuel Sieg
b5372988f7 Add global logging configuration 2023-03-19 09:21:08 +01:00
Samuel Sieg
491777221f Fix destination label filter 2023-03-16 16:15:31 +01:00
David Heinemeier Hansson
cb824bdc42 Merge branch 'main' into role-awareness 2023-03-14 19:11:10 -04:00
Jacopo
50ee954ca9 Fix Traefik retry middleware
As per [Traefik docs](https://doc.traefik.io/traefik/middlewares/overview/#configuration-example)
a middleware to be activated needs to be applied to a route. Change the default settings
to apply the `retry` middleware on every role with Traefik enabled.
2023-03-14 12:15:00 +01:00
Tobias Bühlmann
72e0184e9f Fix failing tests 2023-03-13 17:36:02 +01:00
David Heinemeier Hansson
d2f76dac6b Merge branch 'main' into role-awareness 2023-03-13 15:16:44 +01:00
Richard Taylor
bb241dea43 Add container name env var for containers
Because the container name is generated it isn't possible to
determine this inside the container.

This adds the MRSK_CONTAINER_NAME env var when running the
container so it can be read by the service running inside the
container.
2023-03-11 10:14:41 +00:00
Tobias Bühlmann
418bc13ae7 Apply filters correctly 2023-03-10 10:33:55 +01:00
Tobias Bühlmann
fdb0c8ee91 Rolify app cli/command 2023-03-10 08:50:26 +01:00
David Heinemeier Hansson
1f784176b7 Allow value-less options with true 2023-03-09 11:17:28 +01:00
David Heinemeier Hansson
d3f07d6313 Allow custom options per role 2023-03-09 11:09:19 +01:00
David Heinemeier Hansson
98a14f6173 Add cmd args for roles 2023-03-09 11:01:06 +01:00
David Heinemeier Hansson
371f98d67f Start before stopping and longer timeouts 2023-02-22 19:04:23 +01:00
Paul Gabriel
25e8b91569 fix(escape-cli-args): Always use quotes to escape CLI arguments 2023-02-20 15:02:34 +01:00
David Heinemeier Hansson
5898fdd8f4 Expand arguments to be more self-explanatory in logs 2023-02-19 18:11:06 +01:00
David Heinemeier Hansson
933ece35ab Add healthcheck before deploy 2023-02-18 16:22:08 +01:00
David Heinemeier Hansson
f371cda8d8 Stick with json logger for filebeat compatibility but cap at 10mb 2023-02-09 19:56:17 +01:00
David Heinemeier Hansson
a80289d046 Use local log driver for everything
Auto rotation, max is 100mb
2023-02-09 17:02:15 +01:00
David Heinemeier Hansson
0a293ae4d6 Fix and expand testing 2023-02-04 15:43:45 +01:00
David Heinemeier Hansson
9ec6f9d74f Merge branch 'main' into allow-bastion-server 2023-02-04 15:33:25 +01:00
David Heinemeier Hansson
cf9a402ad8 Stop treating RAILS_MASTER_KEY as special 2023-02-04 15:26:59 +01:00
David Heinemeier Hansson
78d4e1e1e9 Easier to read 2023-02-04 15:12:06 +01:00
David Heinemeier Hansson
74c7a6d5de Expand app command testing 2023-02-04 10:31:04 +01:00
David Heinemeier Hansson
340929e7e7 Use a version 2023-02-04 10:20:51 +01:00
David Heinemeier Hansson
e7ac73be5a Join in run_over_ssh instead of all over 2023-02-04 10:14:31 +01:00
David Heinemeier Hansson
dfca9d8c48 Merge branch 'main' into allow-bastion-server 2023-02-04 10:06:15 +01:00
Xavier Noria
539752e9bd Load with Zeitwerk 2023-02-03 22:45:12 +01:00