Commit Graph

134 Commits

Author SHA1 Message Date
Donal McBreen
cedb8d900f Stop containers with restarting status
When stopping the old container we need to also look for ones with a
restarting status.
2023-05-25 12:10:26 +01:00
Donal McBreen
19f0f40adf Add skip_hooks option 2023-05-23 15:56:47 +01:00
Donal McBreen
f9cb87e55a Fixup rebase issues 2023-05-23 14:10:38 +01:00
Donal McBreen
cc2b321d93 Combine post-deploy and post-rollback 2023-05-23 13:57:24 +01:00
Donal McBreen
004f1b04e6 Remove the skip_broadcast option 2023-05-23 13:57:00 +01:00
Donal McBreen
9fd184dc32 Add post-deploy and post-rollback hooks
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.

The audit broadcast after booting an accessory has been removed.
2023-05-23 13:56:16 +01:00
Donal McBreen
38023fe538 Remove post push hook 2023-05-23 13:55:05 +01:00
Donal McBreen
58c1096a90 MRSK hooks
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.

We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.

Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.

Hooks returning non-zero exit codes will abort the current command.

Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
2023-05-23 13:55:04 +01:00
Donal McBreen
340ed94fa9 Make verify_local_dependencies private
We don't need to what it returns, it raises if there is a problem.

Move it out of the run_locally block to make it easier to add hooks.
2023-05-23 13:55:04 +01:00
David Heinemeier Hansson
4e9c39f26d Merge pull request #271 from basecamp/app-boot-for-rollback
Call app:boot to rollback
2023-05-23 13:17:30 +02:00
Donal McBreen
7cd25fd163 Add more integration tests
Add tests for main, app, accessory, traefik and lock commands.
Other commands are generally covered by the main tests.

Also adds some changes to speed up the integration specs:
- Use a persistent volume for the registry so we can push images to to
reuse between runs (also gets around docker hub rate limits)
- Use persistent volume for mrsk gem install, to avoid re-installing
between tests
- Shorter stop wait time
- Shorter connection timeouts on the load balancer

Takes just over 2 minutes to run all tests locally on an M1 Mac
after docker caches are primed.
2023-05-16 10:35:35 +01:00
Donal McBreen
ee25f200d7 Call app:boot to rollback
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.

Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
2023-05-16 08:59:07 +01:00
Donal McBreen
a5ef1f254f Highlight uncommitted changes in version
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.

Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
2023-05-12 11:08:48 +01:00
Donal McBreen
5d33fb6c33 Better lock messages
- Debug verbosity commands
- Show lock status when we fail to acquire it
- Include lock acquire/release in runtime
2023-05-09 14:17:58 +01:00
David Heinemeier Hansson
aafaee7ac8 Merge pull request #223 from basecamp/customizable-audit-broadcast
Allow customizing audit broadcast with env
2023-05-05 14:30:04 +02:00
Donal McBreen
326711a3e0 Fix aggressive prune breaking rollback
In the image prune command --all overrides --dangling=true. This removes
the image git sha image tag for the latest image which prevented
us from rolling back to it.

I've updated the integration test to now test deploy, redeploy and
rollback.
2023-05-05 12:13:14 +01:00
Kevin McConnell
82be521e66 Merge branch 'main' into customizable-audit-broadcast
* main:
  Fix staging label bug
  Fix typo
  Capture container health log when unhealthy
  Bump version for 0.12.0
2023-05-05 11:40:29 +01:00
Jberczel
0e19ead37c Capture container health log when unhealthy 2023-05-03 15:03:05 -04:00
Jeremy Daer
048aecf352 Audit details (#1)
Audit details

* Audit logs and broadcasts accept `details` whose values are included as log tags and MRSK_* env vars passed to the broadcast command
* Commands may return execution options to the CLI in their args list
* Introduce `mrsk broadcast` helper for sending audit broadcasts
* Report UTC time, not local time, in audit logs. Standardize on ISO 8601 format
2023-05-02 11:42:05 -07:00
David Heinemeier Hansson
b7877c59b4 Merge branch 'main' into docker-readiness 2023-05-02 14:30:35 +02:00
David Heinemeier Hansson
35b5b317af Merge branch 'main' into pr/205
* main:
  Simplify domain language to just "boot" and unscoped config keys
  Retain a fixed number of containers when pruning
  Don't assume rolling back in message
  Check all hosts before rolling back
  Ensure Traefik service name is consistent
  Extend traefik delay by 1 second
  Include traefik access logs
  Check if we are still getting a 404
  Also dump load balancer logs
  Dump traefik logs when app not booted
  Fix missing for apt-get
  Report on container health after failure
  Fix the integration test healthcheck
  Allow percentage-based rolling deployments
  Move `group_limit` & `group_wait` under `boot`
  Limit rolling deployment to boot operation
  Allow performing boot & start operations in groups
2023-05-02 14:29:06 +02:00
David Heinemeier Hansson
4c448f7eb1 Merge pull request #256 from Jberczel/check-local-dependencies
Add local dependencies check
2023-05-02 14:13:23 +02:00
David Heinemeier Hansson
263a24afe3 Further distinguish dependency verification 2023-05-02 14:09:10 +02:00
David Heinemeier Hansson
a2d99e48bf Naming 2023-05-02 14:08:29 +02:00
David Heinemeier Hansson
ae2effb80c Improve clarity and intent 2023-05-02 14:04:23 +02:00
David Heinemeier Hansson
8854bb63a1 Merge pull request #254 from basecamp/retain-last-5-containers
Retain a fixed number of containers when pruning
2023-05-02 13:16:49 +02:00
David Heinemeier Hansson
35ea9f3c81 Merge pull request #255 from basecamp/check-all-hosts-for-rollback-container
Check all hosts before rolling back
2023-05-02 13:16:03 +02:00
David Heinemeier Hansson
71bc9bcf54 Merge pull request #222 from basecamp/deploy-groups
Allow booting containers in groups for rolling restarts
2023-05-02 13:14:32 +02:00
David Heinemeier Hansson
c83b74dcb7 Simplify domain language to just "boot" and unscoped config keys 2023-05-02 13:11:31 +02:00
Donal McBreen
971a91da15 Retain a fixed number of containers when pruning
Time based container and image retention can have variable space
requirements depending on how often we deploy.

- Only prune stopped containers, retaining the 5 newest
- Then prune dangling images so we only keep images for the retained
containers.
2023-05-02 10:15:08 +01:00
Donal McBreen
7fe24d5048 Check all hosts before rolling back
Hosts could end up out of sync with each other if prune commands are run
manually or when new hosts are added.

Before rolling back confirm that the required container is available on
all hosts and roles.
2023-05-02 10:14:50 +01:00
Jberczel
bfb70b2118 Add local dependencies check
Add checks for:

* Docker installed locally
* Docker buildx plugin installed locally
* Dockerfile exists

If checks fail, it will halt deployment and provide more specific error messages.

Also adds a cli subcommand:
`mrsk build dependencies`

Fixes: #109 and #237
2023-05-01 16:32:41 -04:00
Jeremy Daer
e85bd5ff63 Bootstrap: use multi-platform installer
* Limit auto-install to root users; otherwise, give manual install guidance
* Support non-Debian/Ubuntu with the multi-OS get.docker.com installer
2023-05-01 13:26:00 -07:00
David Heinemeier Hansson
4fa6a6c06d Merge pull request #219 from basecamp/docker-health-checks 2023-04-28 11:43:33 +02:00
Donal McBreen
cd668066ff Get lock status by executing directly
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.

Fixes https://github.com/mrsked/mrsk/issues/239
2023-04-25 16:57:02 +01:00
Donal McBreen
52ca5b846a Wait for healthy containers in integration test
Rather than waiting 5 seconds and hoping for the best after we boot
docker compose, add docker healthchecks and wait for all the containers
to be healthy.
2023-04-25 15:41:25 +01:00
Kevin McConnell
f055766918 Allow percentage-based rolling deployments 2023-04-14 12:46:14 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Kevin McConnell
f530009a6e Allow performing boot & start operations in groups
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.

We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
2023-04-13 15:58:27 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Donal McBreen
43f7409de0 Make rollbacks role-aware
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.

We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
2023-04-12 09:59:39 +01:00
David Heinemeier Hansson
daa0c9b5be Merge pull request #196 from handy-la/main
Configurable max_attempts for healthcheck
2023-04-11 14:17:17 +02:00
Jacopo
03d933d10b Add Role to the message 2023-04-11 10:59:25 +02:00
Jacopo
579b4cd9aa Simplify
By using and ad-hoc command to detect and stop stale containers.
By default stale containers are only detected.
2023-04-11 10:22:03 +02:00
Jacopo
8ae5331d97 Boot stop all the old containers 2023-04-11 08:53:33 +02:00
Jacopo
4d47fbdf41 Merge stop and stop_stale_containers 2023-04-11 08:53:33 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00