Commit Graph

174 Commits

Author SHA1 Message Date
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
Donal McBreen
adc7173cf2 Merge pull request #437 from basecamp/kamal-run-directory
Configurable Kamal directory
2023-09-06 14:31:07 +01:00
Krzysztof Adamski
bbcc90e4d1 Configurable Healthcheck Expose Port 2023-09-05 10:53:32 +02:00
Donal McBreen
787688ea08 kamal -> .kamal 2023-08-28 17:13:52 +01:00
Donal McBreen
bcfa1d83e8 Configurable Kamal directory
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.

But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`

The directory is created during bootstrap or before any command that
will need to access a file.
2023-08-28 16:32:18 +01:00
Trevor Vallender
c2ec04f8c1 Allow Traefik to run without publishing port
Adds the `publish` option which, if set to false, does not pass `--publish` to
`docker run` when starting Traefik. This is useful when running Traefik
behind a reverse proxy, for example.
2023-08-24 10:52:10 +01:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
4dd8208290 Extract versions that contains dashes
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.

To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.

Fixes: https://github.com/mrsked/mrsk/issues/402
2023-08-08 14:16:32 +01:00
Bruno Prieto
cbd180205d Include role options when executing commands 2023-07-24 17:45:24 +02:00
Igor Alexandrov
e6ca270537 Include service name to lock details 2023-07-15 21:50:39 +04:00
David Heinemeier Hansson
08d8790851 Merge pull request #337 from igor-alexandrov/feature/cache
Support for Docker multistage build cache
2023-06-20 11:38:46 +02:00
Igor Alexandrov
aa28ee0f3e Inroduce Native::Cached builder 2023-06-18 22:45:04 +04:00
Matt Robinson
21b13bf8d3 Add support for proxy_command to run_over_ssh 2023-06-16 08:22:10 -03:00
Igor Alexandrov
d9b3fac17a Added ability to override default Traefik command line arguments 2023-06-15 15:41:20 +04:00
David Heinemeier Hansson
5a25f073f7 Merge pull request #320 from jsoref/spelling
Spelling
2023-05-31 17:59:18 +02:00
David Heinemeier Hansson
c8f521c0e8 Merge pull request #323 from basecamp/prefix-docker-host-with-real-host
Prefix container hostname with the underlying one
2023-05-31 17:58:55 +02:00
Donal McBreen
28d6a131a9 Prefix container hostname with the underlying one
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.

Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
2023-05-31 16:22:25 +01:00
Donal McBreen
079d9538bb Improve image pruning robustness
If you different images with the same git SHA, on the second deploy the
tag is moved and the first image becomes untagged. It may however still
be attached to an existing container.

To handle this:
1. Initially prune dangling images - this will remove any untagged
images that are not attached to an existing image
2. Then filter out the untagged images when deleting tagged images - any
that remain will be attached to a container.

The second issue is that `docker container ls -a --format '{{.Image}}`
will sometimes return the image id rather than a tag. This means that
the image doesn't get filtered out when we grep to remove the active
images.

To fix that we'll grep against both the image id and repo:tag.
2023-05-31 10:17:52 +01:00
Josh Soref
8e94c21729 spelling: with
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-05-29 20:46:34 -04:00
Donal McBreen
ff7a1e6726 Prune unused images correctly
dangling=true doesn't prune any images, as we are not creating dangling
images.

Using --all should remove unused images, but it considers the Git SHA
tag on the latest image to be unused (presumably because there are two
tags, the SHA and latest and the running container is only considered to
be using "latest"). As a result it deletes the tag, which means that we
can't rollback to that SHA later.

Its a bit more complicated to only remove images that are not referenced
by any containers.

First we find the tags we want to keep from the containers (running and
stopped).

Then we append the latest tag to that list.

Then we get a full list of image tags and remove those tags from that
list (using `grep -v -w`).

Finally we pass the tags to `docker rmi`. That either deletes the tag if
there are other references to the image or both the tag and the image if
it is the only one.
2023-05-25 17:16:46 +01:00
Donal McBreen
cedb8d900f Stop containers with restarting status
When stopping the old container we need to also look for ones with a
restarting status.
2023-05-25 12:10:26 +01:00
Donal McBreen
3b695ae127 Add service_version and add running hook message 2023-05-23 13:56:19 +01:00
Donal McBreen
9fd184dc32 Add post-deploy and post-rollback hooks
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.

The audit broadcast after booting an accessory has been removed.
2023-05-23 13:56:16 +01:00
Donal McBreen
910f14e9c0 Add configuration for hooks_path 2023-05-23 13:55:04 +01:00
Donal McBreen
58c1096a90 MRSK hooks
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.

We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.

Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.

Hooks returning non-zero exit codes will abort the current command.

Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
2023-05-23 13:55:04 +01:00
River He
44b83151e3 Allow to inject environment variables to traefik 2023-05-10 03:18:26 +00:00
David Heinemeier Hansson
aafaee7ac8 Merge pull request #223 from basecamp/customizable-audit-broadcast
Allow customizing audit broadcast with env
2023-05-05 14:30:04 +02:00
Donal McBreen
326711a3e0 Fix aggressive prune breaking rollback
In the image prune command --all overrides --dangling=true. This removes
the image git sha image tag for the latest image which prevented
us from rolling back to it.

I've updated the integration test to now test deploy, redeploy and
rollback.
2023-05-05 12:13:14 +01:00
Kevin McConnell
82be521e66 Merge branch 'main' into customizable-audit-broadcast
* main:
  Fix staging label bug
  Fix typo
  Capture container health log when unhealthy
  Bump version for 0.12.0
2023-05-05 11:40:29 +01:00
Jberczel
0e19ead37c Capture container health log when unhealthy 2023-05-03 15:03:05 -04:00
Jeremy Daer
048aecf352 Audit details (#1)
Audit details

* Audit logs and broadcasts accept `details` whose values are included as log tags and MRSK_* env vars passed to the broadcast command
* Commands may return execution options to the CLI in their args list
* Introduce `mrsk broadcast` helper for sending audit broadcasts
* Report UTC time, not local time, in audit logs. Standardize on ISO 8601 format
2023-05-02 11:42:05 -07:00
David Heinemeier Hansson
88a7413b3e Merge branch 'main' into pr/223
* main:
  Don't run actions twice on PRs
  Further distinguish dependency verification
  Naming
  Reveal configured dockerfile path
  Style
  Distinguish from server dependencies
  Distinguish from local dependency verification
  Improve clarity and intent
  Style
  Style
  Style
  Add local dependencies check
  Bootstrap: use multi-platform installer
2023-05-02 14:44:16 +02:00
David Heinemeier Hansson
9cc73fed9a Merge branch 'main' into pr/223
* main:
  Simplify domain language to just "boot" and unscoped config keys
  Retain a fixed number of containers when pruning
  Don't assume rolling back in message
  Check all hosts before rolling back
  Ensure Traefik service name is consistent
  Extend traefik delay by 1 second
  Include traefik access logs
  Check if we are still getting a 404
  Also dump load balancer logs
  Dump traefik logs when app not booted
  Fix missing for apt-get
  Report on container health after failure
  Fix the integration test healthcheck
  Allow percentage-based rolling deployments
  Move `group_limit` & `group_wait` under `boot`
  Limit rolling deployment to boot operation
  Allow performing boot & start operations in groups
2023-05-02 14:43:17 +02:00
David Heinemeier Hansson
b7877c59b4 Merge branch 'main' into docker-readiness 2023-05-02 14:30:35 +02:00
David Heinemeier Hansson
35b5b317af Merge branch 'main' into pr/205
* main:
  Simplify domain language to just "boot" and unscoped config keys
  Retain a fixed number of containers when pruning
  Don't assume rolling back in message
  Check all hosts before rolling back
  Ensure Traefik service name is consistent
  Extend traefik delay by 1 second
  Include traefik access logs
  Check if we are still getting a 404
  Also dump load balancer logs
  Dump traefik logs when app not booted
  Fix missing for apt-get
  Report on container health after failure
  Fix the integration test healthcheck
  Allow percentage-based rolling deployments
  Move `group_limit` & `group_wait` under `boot`
  Limit rolling deployment to boot operation
  Allow performing boot & start operations in groups
2023-05-02 14:29:06 +02:00
David Heinemeier Hansson
4c448f7eb1 Merge pull request #256 from Jberczel/check-local-dependencies
Add local dependencies check
2023-05-02 14:13:23 +02:00
David Heinemeier Hansson
8854bb63a1 Merge pull request #254 from basecamp/retain-last-5-containers
Retain a fixed number of containers when pruning
2023-05-02 13:16:49 +02:00
Donal McBreen
971a91da15 Retain a fixed number of containers when pruning
Time based container and image retention can have variable space
requirements depending on how often we deploy.

- Only prune stopped containers, retaining the 5 newest
- Then prune dangling images so we only keep images for the retained
containers.
2023-05-02 10:15:08 +01:00
Kevin McConnell
a72f95f44d Ensure Traefik service name is consistent
If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.

That is a problem for two reasons:

- Multiple containers share a common router while a deployment is
  happening. At this point, the router configuration will be different
  between the containers; Traefik flags this as an error, and stops
  routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
  define custom configuration on the service, we'll need to know what
  it will be called.

Changed to force the service name by setting one of its properties.
2023-05-02 09:43:04 +01:00
David Heinemeier Hansson
19527b4f65 Merge branch 'main' into customizable-audit-broadcast 2023-05-02 10:25:25 +02:00
Jberczel
bfb70b2118 Add local dependencies check
Add checks for:

* Docker installed locally
* Docker buildx plugin installed locally
* Dockerfile exists

If checks fail, it will halt deployment and provide more specific error messages.

Also adds a cli subcommand:
`mrsk build dependencies`

Fixes: #109 and #237
2023-05-01 16:32:41 -04:00
Jeremy Daer
e85bd5ff63 Bootstrap: use multi-platform installer
* Limit auto-install to root users; otherwise, give manual install guidance
* Support non-Debian/Ubuntu with the multi-OS get.docker.com installer
2023-05-01 13:26:00 -07:00
Kevin McConnell
99fe31d4b4 Rename MRSK_EVENT -> MRSK_MESSAGE
It's a better name, and frees up `MRSK_EVENT` to be used later.
2023-04-14 16:11:42 +01:00
Kevin McConnell
828e56912e Allow customizing audit broadcast with env
When invoking the audit broadcast command, provide a few environment
variables so that people can customize the format of the message if they
want.

We currently provide `MRSK_PERFORMER`, `MRSK_ROLE`, `MRSK_DESTINATION` and
`MRSK_EVENT`.

Also adds the destination to the default message, which we continue to
send as the first argument as before.
2023-04-13 17:54:25 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Kartikey Tanna
c59eb00dd0 Labels can be added to Traefik 2023-04-12 14:53:48 +05:30