Commit Graph

269 Commits

Author SHA1 Message Date
Donal McBreen
fb0aeec27e Escape the newline in the inspect query 2023-09-12 19:10:39 +01:00
Donal McBreen
7b42daa9fb Merge pull request #457 from basecamp/remove-dangling-image-filter
Remove the `dangling=true` filter
2023-09-12 11:21:50 +01:00
Donal McBreen
2c5ab054db Remove the dangling=true filter
This has been removed from Docker Engine 24 and `docker image prune`
only deletes dangling images anyway.

Fixes https://github.com/basecamp/kamal/issues/410
2023-09-12 11:09:26 +01:00
Donal McBreen
66291a2aea Validate the build image
Kamal needs images to have the service label so it can track them for
pruning. Images built by Kamal will have the label, but externally built
ones may not.

Without it images will build up over time. The worst case is an outage
if all the hosts disks fill up at the same time.

We'll add a check for the label and halt if it is not there.
2023-09-12 10:45:01 +01:00
Donal McBreen
6a3b0249fe Connect to remote host before creating builder
Connecting to the remote host will make any SSH configuration issues
obvious and add the host to known hosts if that is how SSHKit is
configured.
2023-09-12 09:12:57 +01:00
Donal McBreen
50a4f83db6 Merge pull request #450 from basecamp/stop-stale-container-when-deploying
Stop stale containers when deploying
2023-09-12 08:26:16 +01:00
Donal McBreen
00cb7d99d8 Merge pull request #449 from basecamp/asset-path
Asset paths
2023-09-12 08:26:07 +01:00
Donal McBreen
afb9b0bbe2 Stop stale containers when deploying
An interrupted deployment can leave older containers lying around. To
ensure they are cleaned up subsequently, stop stale containers during
deployments instead of just reporting them.
2023-09-11 14:49:06 +01:00
Donal McBreen
718776eb72 Prune healthcheck containers
If a deployment is interrupted it could leave stale healthcheck
containers around that prevent dependent images from being pruned.
2023-09-11 14:36:25 +01:00
Donal McBreen
0b439362da Asset paths
During deployments both the old and new containers will be active for a
small period of time. There also may be lagging requests for older CSS
and JS after the deployment.

This can lead to 404s if a request for old assets hits a new container
or visa-versa.

This PR makes sure that both sets of assets are available throughout the
deployment from before the new version of the app is booted.

This can be configured by setting the asset path:

```yaml
asset_path: "/rails/public/assets"
```

The process is:
1. We extract the assets out of the container, with docker run, docker
cp, docker stop. Docker run sets the container command to "sleep" so
this needs to be available in the container.
2. We create an asset volume directory on the host for the new version
of the app on the host and copy the assets in there.
3. If there is a previous deployment we also copy the new assets into
its asset volume and copy the older assets into the new asset volume.
4. We start the new container mapping the asset volume over the top of
the container's asset path.

This means the both the old and new versions have replaced the asset
path with a volume containing both sets of assets and should be able
to serve any request during the deployment. The older assets will
continue to be available until the next deployment.
2023-09-11 12:18:18 +01:00
Donal McBreen
cd02510d0f Output one mount per line
The go template was concatenating all the mounts into one line. It
happened to work because the mount we are interested was always first.

Fix it to output one mount per line instead.
2023-09-07 15:20:50 +01:00
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
Donal McBreen
787688ea08 kamal -> .kamal 2023-08-28 17:13:52 +01:00
Donal McBreen
bcfa1d83e8 Configurable Kamal directory
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.

But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`

The directory is created during bootstrap or before any command that
will need to access a file.
2023-08-28 16:32:18 +01:00
David Heinemeier Hansson
dc1421a1fc Correct casing 2023-08-22 09:22:32 -07:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
4dd8208290 Extract versions that contains dashes
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.

To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.

Fixes: https://github.com/mrsked/mrsk/issues/402
2023-08-08 14:16:32 +01:00
Lewis Buckley
313f89a108 Merge branch 'main' into rolling-traefik-restarts
* main:
  Removed not needed MRSK.traefik.run command in Traefil reboot
  Updated README with locking directory name
  Include service name to lock details
  Configurable SSH log levels
  Add registry container output to debug
  Minor tweaks to hooks section in readme
  Update README.md
  Updated README.md to make setup examples consistent
  Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 14:46:16 +01:00
Lewis Buckley
9ab448e186 Support a --rolling option for traefik reboots 2023-07-19 14:39:27 +01:00
Donal McBreen
e1433f3895 Merge pull request #349 from igor-alexandrov/login-to-registry-proactively
Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 13:36:00 +01:00
Igor Alexandrov
e6ca270537 Include service name to lock details 2023-07-15 21:50:39 +04:00
Igor Alexandrov
2746a48e88 Login to the registry proactively before stoping Accessory and Traefik 2023-06-22 15:13:47 +04:00
Donal McBreen
4950f61a87 Only require secrets when mutating
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
2023-06-20 15:39:51 +01:00
Igor Alexandrov
d3f5e9efe8 Updated Traefik CLI test 2023-06-15 17:11:20 +04:00
David Heinemeier Hansson
504a09ef1d Merge pull request #318 from basecamp/pre-deploy-hook
Add a pre-deploy hook
2023-05-31 17:59:46 +02:00
David Heinemeier Hansson
5a25f073f7 Merge pull request #320 from jsoref/spelling
Spelling
2023-05-31 17:59:18 +02:00
David Heinemeier Hansson
c8f521c0e8 Merge pull request #323 from basecamp/prefix-docker-host-with-real-host
Prefix container hostname with the underlying one
2023-05-31 17:58:55 +02:00
Donal McBreen
28d6a131a9 Prefix container hostname with the underlying one
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.

Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
2023-05-31 16:22:25 +01:00
Donal McBreen
079d9538bb Improve image pruning robustness
If you different images with the same git SHA, on the second deploy the
tag is moved and the first image becomes untagged. It may however still
be attached to an existing container.

To handle this:
1. Initially prune dangling images - this will remove any untagged
images that are not attached to an existing image
2. Then filter out the untagged images when deleting tagged images - any
that remain will be attached to a container.

The second issue is that `docker container ls -a --format '{{.Image}}`
will sometimes return the image id rather than a tag. This means that
the image doesn't get filtered out when we grep to remove the active
images.

To fix that we'll grep against both the image id and repo:tag.
2023-05-31 10:17:52 +01:00
Josh Soref
fc00392d68 spelling: installed
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-05-29 20:46:34 -04:00
Donal McBreen
db0bf6bb16 Add a pre-deploy hook
Useful for checking the status of CI before deploying. Doing this at
this point in the deployment maximises the parallelisation of building
and running CI.
2023-05-29 16:06:41 +01:00
Donal McBreen
ff7a1e6726 Prune unused images correctly
dangling=true doesn't prune any images, as we are not creating dangling
images.

Using --all should remove unused images, but it considers the Git SHA
tag on the latest image to be unused (presumably because there are two
tags, the SHA and latest and the running container is only considered to
be using "latest"). As a result it deletes the tag, which means that we
can't rollback to that SHA later.

Its a bit more complicated to only remove images that are not referenced
by any containers.

First we find the tags we want to keep from the containers (running and
stopped).

Then we append the latest tag to that list.

Then we get a full list of image tags and remove those tags from that
list (using `grep -v -w`).

Finally we pass the tags to `docker rmi`. That either deletes the tag if
there are other references to the image or both the tag and the image if
it is the only one.
2023-05-25 17:16:46 +01:00
David Heinemeier Hansson
e35334e5fe Merge pull request #313 from basecamp/stop-restarting-containers
Stop containers with restarting status
2023-05-25 14:04:09 +02:00
Donal McBreen
cedb8d900f Stop containers with restarting status
When stopping the old container we need to also look for ones with a
restarting status.
2023-05-25 12:10:26 +01:00
Donal McBreen
66f9ce0e90 Add a pre-connect hook
This can be used for hooks that should run before connecting to remote
hosts. An example use case is pre-warming DNS.
2023-05-24 14:39:30 +01:00
Donal McBreen
19f0f40adf Add skip_hooks option 2023-05-23 15:56:47 +01:00
Donal McBreen
f9cb87e55a Fixup rebase issues 2023-05-23 14:10:38 +01:00
Donal McBreen
cc2b321d93 Combine post-deploy and post-rollback 2023-05-23 13:57:24 +01:00
Donal McBreen
004f1b04e6 Remove the skip_broadcast option 2023-05-23 13:57:00 +01:00
Donal McBreen
9fd184dc32 Add post-deploy and post-rollback hooks
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.

The audit broadcast after booting an accessory has been removed.
2023-05-23 13:56:16 +01:00
Donal McBreen
38023fe538 Remove post push hook 2023-05-23 13:55:05 +01:00
Donal McBreen
58c1096a90 MRSK hooks
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.

We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.

Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.

Hooks returning non-zero exit codes will abort the current command.

Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
2023-05-23 13:55:04 +01:00
Donal McBreen
340ed94fa9 Make verify_local_dependencies private
We don't need to what it returns, it raises if there is a problem.

Move it out of the run_locally block to make it easier to add hooks.
2023-05-23 13:55:04 +01:00
David Heinemeier Hansson
4e9c39f26d Merge pull request #271 from basecamp/app-boot-for-rollback
Call app:boot to rollback
2023-05-23 13:17:30 +02:00
Donal McBreen
7cd25fd163 Add more integration tests
Add tests for main, app, accessory, traefik and lock commands.
Other commands are generally covered by the main tests.

Also adds some changes to speed up the integration specs:
- Use a persistent volume for the registry so we can push images to to
reuse between runs (also gets around docker hub rate limits)
- Use persistent volume for mrsk gem install, to avoid re-installing
between tests
- Shorter stop wait time
- Shorter connection timeouts on the load balancer

Takes just over 2 minutes to run all tests locally on an M1 Mac
after docker caches are primed.
2023-05-16 10:35:35 +01:00
Donal McBreen
ee25f200d7 Call app:boot to rollback
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.

Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
2023-05-16 08:59:07 +01:00
Donal McBreen
a5ef1f254f Highlight uncommitted changes in version
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.

Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
2023-05-12 11:08:48 +01:00
Donal McBreen
5d33fb6c33 Better lock messages
- Debug verbosity commands
- Show lock status when we fail to acquire it
- Include lock acquire/release in runtime
2023-05-09 14:17:58 +01:00
David Heinemeier Hansson
aafaee7ac8 Merge pull request #223 from basecamp/customizable-audit-broadcast
Allow customizing audit broadcast with env
2023-05-05 14:30:04 +02:00