Commit Graph

273 Commits

Author SHA1 Message Date
David Heinemeier Hansson
71bc9bcf54 Merge pull request #222 from basecamp/deploy-groups
Allow booting containers in groups for rolling restarts
2023-05-02 13:14:32 +02:00
David Heinemeier Hansson
c83b74dcb7 Simplify domain language to just "boot" and unscoped config keys 2023-05-02 13:11:31 +02:00
Donal McBreen
650f9b1fbf Include traefik access logs 2023-05-01 18:55:10 +01:00
Donal McBreen
1170e2311e Check if we are still getting a 404 2023-05-01 18:32:07 +01:00
Donal McBreen
94f87edded Also dump load balancer logs 2023-05-01 18:27:08 +01:00
Donal McBreen
548a1019c1 Dump traefik logs when app not booted 2023-05-01 18:21:22 +01:00
Donal McBreen
ca2e2bac2e Fix missing for apt-get 2023-05-01 12:50:45 +01:00
Donal McBreen
494a1ae089 Report on container health after failure 2023-05-01 12:13:12 +01:00
Donal McBreen
a77428143f Fix the integration test healthcheck
The alpine nginx container doesn't contain curl, so let's override the
healthcheck command to use wget.
2023-05-01 12:11:24 +01:00
David Heinemeier Hansson
4fa6a6c06d Merge pull request #219 from basecamp/docker-health-checks 2023-04-28 11:43:33 +02:00
Donal McBreen
cd668066ff Get lock status by executing directly
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.

Fixes https://github.com/mrsked/mrsk/issues/239
2023-04-25 16:57:02 +01:00
Donal McBreen
52ca5b846a Wait for healthy containers in integration test
Rather than waiting 5 seconds and hoping for the best after we boot
docker compose, add docker healthchecks and wait for all the containers
to be healthy.
2023-04-25 15:41:25 +01:00
Donal McBreen
bcf8a927f5 Run a mrsk deploy integration test
Adds a simple integration test to ensure that `mrsk deploy` works.

Everything required is spun up with docker compose:
- shared: a container that contains an ssh key and a self signed cert to
be shared between the images
- deployer: the image we will deploy from
- registry: a docker registry
- two vm images to deploy into
- load_balancer: an nginx load balancer to use between our images

The other images are in privileged mode so that we can run
docker-in-docker. We need to run docker inside the images - mapping in
the docker socket doesn't work because both VMs would share the host
daemon.

The docker registry requires a self signed cert as you cannot use basic
auth over HTTP except on localhost. It runs on port 4443 rather than 443
because docker refused to accept that "registry" is a docker host and
tries to push images to docker.io/registry. "registry:4443" works fine.

The shared container contains the ssh keys for the deployer and vms, and
the self signed cert for the registry. When the shared container boots,
it copies them into a shared volume.

The other deployer and vm images are built with soft links from the
shared volume to the require locations. Their boot scripts wait for the
files to be copied in before continuing.

The root mrsk folder is mapped into the deployer container. On boot it
builds the gem and installs it.

Right now there's just a single test. We confirm that the load balancer
is returning a 502, run `mrsk deploy` and then confirm it returns 200.
2023-04-14 15:49:43 +01:00
Kevin McConnell
f055766918 Allow percentage-based rolling deployments 2023-04-14 12:46:14 +01:00
Kevin McConnell
a8726be20e Move group_limit & group_wait under boot
Also make formatting the group strategy the responsibility of the
commander.
2023-04-14 11:31:51 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Kevin McConnell
f530009a6e Allow performing boot & start operations in groups
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.

We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
2023-04-13 15:58:27 +01:00
David Heinemeier Hansson
bc8875e020 Merge pull request #183 from basecamp/cleanup-excessive-containers-running
Clear stale containers
2023-04-12 15:58:59 +02:00
David Heinemeier Hansson
b6f7d94ac3 Merge pull request #144 from monorkin/shell-escape-dollar-signs
Shell escape dollar signs
2023-04-12 15:57:37 +02:00
Stanko K.R
3ab16c8994 Shell escape dollar signs
But allow for shell expansion using curly braces e.g. ${PWD}
2023-04-12 15:55:54 +02:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
David Heinemeier Hansson
2f1393cd92 Merge pull request #212 from basecamp/role-primary-hosts
Pull the primary host from the role
2023-04-12 14:09:38 +02:00
Donal McBreen
fb62f2e6e1 Pull the primary host from the role
So commands like this run on a host with the specified role:
```
mrsk app exec -r=console -i "/bin/bash`
mrsk app logs -f -r=workers
```
2023-04-12 13:03:02 +01:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
David Heinemeier Hansson
2d0a7e1b67 Merge pull request #208 from tannakartikey/add_labels_to_traefik
Labels can be added to Traefik
2023-04-12 11:35:28 +02:00
Kartikey Tanna
c59eb00dd0 Labels can be added to Traefik 2023-04-12 14:53:48 +05:30
Donal McBreen
43f7409de0 Make rollbacks role-aware
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.

We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
2023-04-12 09:59:39 +01:00
David Heinemeier Hansson
daa0c9b5be Merge pull request #196 from handy-la/main
Configurable max_attempts for healthcheck
2023-04-11 14:17:17 +02:00
Jacopo
03d933d10b Add Role to the message 2023-04-11 10:59:25 +02:00
Jacopo
579b4cd9aa Simplify
By using and ad-hoc command to detect and stop stale containers.
By default stale containers are only detected.
2023-04-11 10:22:03 +02:00
Jacopo
8ae5331d97 Boot stop all the old containers 2023-04-11 08:53:33 +02:00
Jacopo
4d47fbdf41 Merge stop and stop_stale_containers 2023-04-11 08:53:33 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
Arturo Ojeda
514b2aa243 Fix test case: console output message was not updated to display the current/total attempts 2023-04-10 09:29:19 -06:00
Donal McBreen
c4df440c79 Improved deploy lock acquisition
1. Don't raise lock error for non-lock issues during lock acquire
  (see https://github.com/mrsked/mrsk/pull/181)
2. If there is an error while the lock is held, don't release the lock
  and send a warning to stderr
2023-04-10 15:23:00 +01:00
David Heinemeier Hansson
fb1718ca6d Merge pull request #197 from tannakartikey/traefik_rules_with_destination
Traefik service name to be derived from role and destination
2023-04-10 15:11:07 +02:00
David Heinemeier Hansson
7d17a6c3b5 Excess CR 2023-04-10 15:10:08 +02:00
Arturo Ojeda
3969f56fa6 Improved: configurable max_attempts for healthcheck 2023-04-09 12:07:27 -06:00
Kartikey Tanna
c60cc92dfe Traefik service name to be derived from role and destination 2023-04-09 13:44:57 +05:30
Jeremy Daer
bd8f13dd5e Traefik image config for version pinning, upgrades, and custom images
Accounts for the 2.9.10 security release and allows testing Traefik 3 betas.

* Use `image` to configure a specific Traefik Docker image.
* Default to `traefik:v2.9` to track future 2.9.x minor releases rather
  than tightly pinning to `v2.9.9`.
* Support images from the configured registry.

References #165
2023-04-07 14:15:25 -07:00
Jeremy Daer
c137b38c87 Only redact the non-sensitive bits of build args and env vars.
* `-e [REDACTED]` → `-e SOME_SECRET=[REDACTED]`
* Replaces `Utils.redact` with `Utils.sensitive` to clarify that we're
  indicating redactability, not actually performing redaction.
* Redacts from YAML output, including `mrsk config` (fixes #96)
2023-04-05 09:45:28 -07:00
David Heinemeier Hansson
1f83b5f6be Fix failure to pass on class options to subcommands 2023-03-28 18:04:16 +02:00
Kevin McConnell
2957388bf6 Pin Traefik to v2.9.9 2023-03-28 14:59:03 +01:00
David Heinemeier Hansson
7f178101f7 Merge pull request #164 from basecamp/accessory-hosts-or-roles
Run accessories on multiple hosts or roles
2023-03-28 14:31:24 +02:00
Donal McBreen
c06585fef4 Daemon/host/role accessories
Allow the hosts for accessories to be specified by host or role, or on
all app hosts by setting `daemon: true`.

```
  # Single host
  mysql:
    host: 1.1.1.1
  # Multiple hosts
  redis:
    hosts:
      - 1.1.1.1
      - 1.1.1.2
  # By role
  monitoring:
    roles:
      - web
      - jobs
```
2023-03-28 13:26:27 +01:00
David Heinemeier Hansson
fd5313ec3e Merge pull request #163 from milk1000cc/rolify-app-logs
Rolify app logs cli/command
2023-03-28 14:13:02 +02:00
milk1000cc
03614bfb79 Rolify app logs cli/command 2023-03-27 23:08:46 +09:00
Tobias Bühlmann
078d68b170 Push <image>:latest in addition to <image>:<git-ref> 2023-03-27 12:52:11 +02:00