Commit Graph

90 Commits

Author SHA1 Message Date
Donal McBreen
ee25f200d7 Call app:boot to rollback
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.

Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
2023-05-16 08:59:07 +01:00
Donal McBreen
a5ef1f254f Highlight uncommitted changes in version
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.

Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
2023-05-12 11:08:48 +01:00
Jeremy Daer
048aecf352 Audit details (#1)
Audit details

* Audit logs and broadcasts accept `details` whose values are included as log tags and MRSK_* env vars passed to the broadcast command
* Commands may return execution options to the CLI in their args list
* Introduce `mrsk broadcast` helper for sending audit broadcasts
* Report UTC time, not local time, in audit logs. Standardize on ISO 8601 format
2023-05-02 11:42:05 -07:00
David Heinemeier Hansson
71bc9bcf54 Merge pull request #222 from basecamp/deploy-groups
Allow booting containers in groups for rolling restarts
2023-05-02 13:14:32 +02:00
David Heinemeier Hansson
c83b74dcb7 Simplify domain language to just "boot" and unscoped config keys 2023-05-02 13:11:31 +02:00
Kevin McConnell
a8726be20e Move group_limit & group_wait under boot
Also make formatting the group strategy the responsibility of the
commander.
2023-04-14 11:31:51 +01:00
Kevin McConnell
100b72e4b4 Limit rolling deployment to boot operation 2023-04-14 10:41:07 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Kevin McConnell
f530009a6e Allow performing boot & start operations in groups
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.

We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
2023-04-13 15:58:27 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
3cbf4aea46 Make method private method and use :send 2023-04-12 11:53:49 +02:00
Jacopo
72b70e3e9e More compact 2023-04-11 16:22:47 +02:00
Jacopo
e8697327fa Use no_commands block 2023-04-11 16:20:16 +02:00
Jacopo
0bfd4ca780 Use cli = self approach 2023-04-11 16:04:46 +02:00
Jacopo
12e3a562c4 Extract helper 2023-04-11 15:26:55 +02:00
Jacopo
c3393c8213 Remove dot 2023-04-11 11:03:11 +02:00
Jacopo
03d933d10b Add Role to the message 2023-04-11 10:59:25 +02:00
Jacopo
579b4cd9aa Simplify
By using and ad-hoc command to detect and stop stale containers.
By default stale containers are only detected.
2023-04-11 10:22:03 +02:00
Jacopo
f9436d5673 Style 2023-04-11 08:53:33 +02:00
Jacopo
8ae5331d97 Boot stop all the old containers 2023-04-11 08:53:33 +02:00
Jacopo
4d47fbdf41 Merge stop and stop_stale_containers 2023-04-11 08:53:33 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
milk1000cc
15a41d3fd8 Follow web role logs when no roles are specified 2023-03-28 09:02:42 +09:00
milk1000cc
03614bfb79 Rolify app logs cli/command 2023-03-27 23:08:46 +09:00
Donal McBreen
05488e4c1e Zero downtime redeploys
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.

We can then do the normal zero downtime run/wait/stop.

While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
2023-03-24 17:09:20 +00:00
David Heinemeier Hansson
84540cee7b Merge branch 'main' into pr/154
* main: (32 commits)
  Inline default as with other options
  Symbols!
  Fix tests
  test stop with custom stop wait time
  No need to replicate Docker default
  Describe purpose rather than elements
  Style and ordering
  Customizable stop wait time
  Fix tests
  Ensure it also works when configuring just log options without setting a driver
  Add accessory test
  Undo change
  Improve test
  Update README
  Ensure default log option `max-size=10m`
  #142 Allow to customize container options in accessories
  Fix flaky test
  Fix tests
  More resilient tests
  Fix other tests
  ...
2023-03-24 15:43:17 +01:00
David Heinemeier Hansson
93423f2f20 Merge branch 'main' into pr/99
* main:
  Wording
  Remove accessory images using tags rather than labels
  Update readme to point to ghcr.io/mrsked/mrsk
  Validate that all roles have hosts
  Commander needn't accumulate configuration
  Pull latest image tag, so we can identity it
  Default to deploying the config version
  Remove unneeded Dockerfile.dind, update Readme
  add D-in-D dockerfile, update Readme
2023-03-24 14:26:31 +01:00
Donal McBreen
8d8f9f6ada Deploy locks
Add a deploy lock for commands that are unsafe to run concurrently.

The lock is taken by creating a `mrsk_lock` directory on the primary
host. Details of who took the lock are added to a details file in that
directory.

Additional CLI commands have been added to manual release and acquire
the lock and to check its status.

```
Commands:
  mrsk lock acquire -m, --message=MESSAGE  # Acquire the deploy lock
  mrsk lock help [COMMAND]                 # Describe subcommands or one specific subcommand
  mrsk lock release                        # Release the deploy lock
  mrsk lock status                         # Report lock status

Options:
  -v, [--verbose], [--no-verbose]                # Detailed logging
  -q, [--quiet], [--no-quiet]                    # Minimal logging
      [--version=VERSION]                        # Run commands against a specific app version
  -p, [--primary], [--no-primary]                # Run commands only on primary host instead of all
  -h, [--hosts=HOSTS]                            # Run commands on these hosts instead of all (separate by comma)
  -r, [--roles=ROLES]                            # Run commands on these roles instead of all (separate by comma)
  -c, [--config-file=CONFIG_FILE]                # Path to config file
                                                 # Default: config/deploy.yml
  -d, [--destination=DESTINATION]                # Specify destination to be used for config file (staging -> deploy.staging.yml)
  -B, [--skip-broadcast], [--no-skip-broadcast]  # Skip audit broadcasts
```

If we add support for running multiple deployments on a single server
we'll need to extend the locking to lock per deployment.
2023-03-24 12:28:08 +00:00
David Heinemeier Hansson
c89b77127b Merge pull request #143 from djmb/default-to-deploying-config-version
Default to deploying the config version
2023-03-24 12:36:20 +01:00
Jeremy Daer
1887a6518e Commander needn't accumulate configuration
Commander had version/destination solely to incrementally accumulate CLI
options. Simpler to configure in one shot.

Clarifies responsibility and lets us introduce things like
`abbreviated_version` in one spot - Configuration.
2023-03-23 08:57:32 -07:00
Donal McBreen
1ed4a37da2 Pull latest image tag, so we can identity it
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
2023-03-23 14:39:32 +00:00
Tobias Bühlmann
ccf8762c98 Reuse web container per default 2023-03-10 10:50:26 +01:00
Tobias Bühlmann
7d4dfc4c86 Pass role names for simplicity 2023-03-10 09:18:47 +01:00
Tobias Bühlmann
fdb0c8ee91 Rolify app cli/command 2023-03-10 08:50:26 +01:00
Tobias Bühlmann
8b913068de Add destination to healthcheck containers names 2023-03-06 16:54:13 +01:00
David Heinemeier Hansson
701f6ff237 Move sleep note out of host loop, so we only see it once 2023-02-26 11:27:19 +01:00
David Heinemeier Hansson
bc6963e6bf Note that rebooting may cause air gap 2023-02-23 12:16:58 +01:00
David Heinemeier Hansson
f4f2b5cb17 Communicate the readiness delay 2023-02-23 12:04:57 +01:00
David Heinemeier Hansson
e12436a1db Extract readiness_delay to config 2023-02-23 12:02:49 +01:00
David Heinemeier Hansson
371f98d67f Start before stopping and longer timeouts 2023-02-22 19:04:23 +01:00
David Heinemeier Hansson
f3a5845501 Remember this 2023-02-19 17:16:14 +01:00
David Heinemeier Hansson
0fa70f4688 Stop app before removing it 2023-02-19 17:15:57 +01:00
David Heinemeier Hansson
42bc691758 CLI doc updates
Match word

Language

Suggest what accessories are

There are also accessories

Default already shown

Better example

Warn about secrets being shown

Now also accessories

Wording

Clarifications

Clarify how to see options

General option for all

Options important here too

Hide subcommands

Implied

Simpler as just version

Be concise

Missing word

Wordsmith

Simpler and uniform words are better

Clarify what exactly we're manipulating

Wordsmithing

Implicit

Simpler language

Hide subcommands

Clarify its container management

Just one per server

Simpler
2023-02-19 17:15:44 +01:00
David Heinemeier Hansson
a0d71f3fe4 Protect against missing current version 2023-02-19 09:48:35 +01:00
David Heinemeier Hansson
4fe7fb705a Use same sentence style as broadcasts for audit log lines 2023-02-18 12:00:15 +01:00
David Heinemeier Hansson
18bdb33de2 Fix issue with removing containers triggering twice, then ensure app stop runs closer to app run on each host 2023-02-07 15:05:58 +01:00
David Heinemeier Hansson
6f1a3f5524 Don't need this, just use containers 2023-02-04 10:16:24 +01:00
Xavier Noria
539752e9bd Load with Zeitwerk 2023-02-03 22:45:12 +01:00