Commit Graph

598 Commits

Author SHA1 Message Date
Donal McBreen
2c5ab054db Remove the dangling=true filter
This has been removed from Docker Engine 24 and `docker image prune`
only deletes dangling images anyway.

Fixes https://github.com/basecamp/kamal/issues/410
2023-09-12 11:09:26 +01:00
Donal McBreen
66291a2aea Validate the build image
Kamal needs images to have the service label so it can track them for
pruning. Images built by Kamal will have the label, but externally built
ones may not.

Without it images will build up over time. The worst case is an outage
if all the hosts disks fill up at the same time.

We'll add a check for the label and halt if it is not there.
2023-09-12 10:45:01 +01:00
Donal McBreen
6a3b0249fe Connect to remote host before creating builder
Connecting to the remote host will make any SSH configuration issues
obvious and add the host to known hosts if that is how SSHKit is
configured.
2023-09-12 09:12:57 +01:00
Donal McBreen
ade90bc051 Use LTS version of Ubuntu for integration tests 2023-09-12 08:59:54 +01:00
Donal McBreen
daa53f5831 Merge pull request #451 from basecamp/require-destinations
Add a require_destination setting
2023-09-12 08:26:36 +01:00
Donal McBreen
50a4f83db6 Merge pull request #450 from basecamp/stop-stale-container-when-deploying
Stop stale containers when deploying
2023-09-12 08:26:16 +01:00
Donal McBreen
00cb7d99d8 Merge pull request #449 from basecamp/asset-path
Asset paths
2023-09-12 08:26:07 +01:00
Donal McBreen
26dcd75423 Add a require_destination setting
If you always want to use a destination, and have a base deploy.yml file
that doesn't specify any hosts, then if you forget to specific the
destination you will get a cryptic error.

Add a "require_destination" setting you can use to avoid this.
2023-09-11 16:57:11 +01:00
Donal McBreen
afb9b0bbe2 Stop stale containers when deploying
An interrupted deployment can leave older containers lying around. To
ensure they are cleaned up subsequently, stop stale containers during
deployments instead of just reporting them.
2023-09-11 14:49:06 +01:00
Donal McBreen
718776eb72 Prune healthcheck containers
If a deployment is interrupted it could leave stale healthcheck
containers around that prevent dependent images from being pruned.
2023-09-11 14:36:25 +01:00
Donal McBreen
9d35793287 Merge pull request #440 from gf3/fix/ssh-auth-methods
fix: do not hardcode Net::SSH auth_methods
2023-09-11 14:32:37 +01:00
Donal McBreen
0b439362da Asset paths
During deployments both the old and new containers will be active for a
small period of time. There also may be lagging requests for older CSS
and JS after the deployment.

This can lead to 404s if a request for old assets hits a new container
or visa-versa.

This PR makes sure that both sets of assets are available throughout the
deployment from before the new version of the app is booted.

This can be configured by setting the asset path:

```yaml
asset_path: "/rails/public/assets"
```

The process is:
1. We extract the assets out of the container, with docker run, docker
cp, docker stop. Docker run sets the container command to "sleep" so
this needs to be available in the container.
2. We create an asset volume directory on the host for the new version
of the app on the host and copy the assets in there.
3. If there is a previous deployment we also copy the new assets into
its asset volume and copy the older assets into the new asset volume.
4. We start the new container mapping the asset volume over the top of
the container's asset path.

This means the both the old and new versions have replaced the asset
path with a volume containing both sets of assets and should be able
to serve any request during the deployment. The older assets will
continue to be available until the next deployment.
2023-09-11 12:18:18 +01:00
Donal McBreen
cd02510d0f Output one mount per line
The go template was concatenating all the mounts into one line. It
happened to work because the mount we are interested was always first.

Fix it to output one mount per line instead.
2023-09-07 15:20:50 +01:00
Donal McBreen
cccf79ed94 Merge branch 'main' into fix/ssh-auth-methods 2023-09-07 10:21:28 +01:00
Gianni Chiappetta
9a539ffc86 chore: update tests to remove hardcoded ssh auth method 2023-09-06 10:59:17 -04:00
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
Donal McBreen
adc7173cf2 Merge pull request #437 from basecamp/kamal-run-directory
Configurable Kamal directory
2023-09-06 14:31:07 +01:00
Krzysztof Adamski
c2b2f7ea33 Fixing Tests 2023-09-06 10:16:59 +02:00
Krzysztof Adamski
bbcc90e4d1 Configurable Healthcheck Expose Port 2023-09-05 10:53:32 +02:00
Donal McBreen
787688ea08 kamal -> .kamal 2023-08-28 17:13:52 +01:00
Donal McBreen
bcfa1d83e8 Configurable Kamal directory
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.

But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`

The directory is created during bootstrap or before any command that
will need to access a file.
2023-08-28 16:32:18 +01:00
Trevor Vallender
c2ec04f8c1 Allow Traefik to run without publishing port
Adds the `publish` option which, if set to false, does not pass `--publish` to
`docker run` when starting Traefik. This is useful when running Traefik
behind a reverse proxy, for example.
2023-08-24 10:52:10 +01:00
Donal McBreen
d0fbf538d3 Add integration test hooks back in 2023-08-23 07:36:48 +01:00
David Heinemeier Hansson
d981c3c968 Move hooks 2023-08-22 12:47:00 -07:00
David Heinemeier Hansson
dc1421a1fc Correct casing 2023-08-22 09:22:32 -07:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
e2c3709d74 Merge pull request #417 from manastyretskyi/main
Fix builder registry cache when using default registry
2023-08-17 14:08:05 +01:00
Donal McBreen
1163c3de07 Configurable log levels
Allow ssh log_level to be set - this will help to debug connection
issues.
2023-08-15 16:51:56 +01:00
Donal McBreen
715cd94bbf Merge pull request #413 from mrsked/extract-version-from-container-name-correctly
Extract versions that contains dashes
2023-08-15 15:11:03 +01:00
Donal McBreen
dda7099b2f Merge pull request #414 from mrsked/traefik-start-stop-run-errors
Don't hide Traefik errors
2023-08-15 15:10:47 +01:00
Liubomyr Manastyretskyi
6774675547 Fix builder registry cache when using default registry 2023-08-13 12:04:03 +03:00
Igor Alexandrov
0c52a1053e Removed not needed configuration test 2023-08-08 19:14:03 +04:00
Donal McBreen
c2d7fd775f Don't hide Traefik errors
When stopping or starting Traefik, don't hide important errors.

Docker doesn't return an error when starting a started container or
stopping a stopped container.

When rebooting we want to know about errors during run as we've just
stopped and removed the previous container.

When booting, we want to leave the running container if it exists,
restart a stopped container and run a new one if none exists.

We can implement this with `docker start ... || docker run ...`:
- if the container is started, `docker start` will exit with 0
- if the container is stopped, `docker start` will start it and exit with 0
- if the container doesn't exist, `docker start` will return a non zero
exit code and `docker run` will create a new container. Any errors in
`docker run` will be returned.
2023-08-08 15:41:16 +01:00
Donal McBreen
4dd8208290 Extract versions that contains dashes
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.

To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.

Fixes: https://github.com/mrsked/mrsk/issues/402
2023-08-08 14:16:32 +01:00
Donal McBreen
aa89ededde Merge pull request #399 from mrsked/manage-ssh-connection-starts
Manage SSH connection starts
2023-08-07 14:37:34 +01:00
David Heinemeier Hansson
299b166db7 Merge pull request #389 from brunoprietog/include-role-options-when-executing-commands
Include role options when executing commands
2023-07-26 14:04:28 +02:00
Donal McBreen
94d6a763a8 Extract ssh and sshkit configuration 2023-07-26 12:26:23 +01:00
Donal McBreen
752ff53458 Merge pull request #396 from igor-alexandrov/track-uncommitted-changes
Log uncommitted changes during deploy
2023-07-25 14:35:44 +01:00
Donal McBreen
f64b596907 Prevent SSH connection restarts
Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).

The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
2023-07-25 13:09:46 +01:00
Donal McBreen
b25cfa178b Limit SSH start concurrency
Starting many (90+) SSH connections has caused us some issues such as
failed DNS lookups and hitting process file descriptor limits.

To mitigate this, patch SSHKit::Backend::Netssh to limit concurrency of
connection starts. We'll default to 30 at a time which seems to work
without issue, but can be configured via:

```
sshkit:
  max_concurrent_starts: 10
```
2023-07-25 13:08:44 +01:00
Donal McBreen
2daaf442fa Revert "Configurable SSH log levels" 2023-07-25 12:53:45 +01:00
Bruno Prieto
cbd180205d Include role options when executing commands 2023-07-24 17:45:24 +02:00
Igor Alexandrov
9c2a1dc7cd Removed commented code in tests 2023-07-21 18:44:01 +04:00
Igor Alexandrov
0cfafd1d25 Log uncommitted changes during deploy 2023-07-21 18:37:45 +04:00
Lewis Buckley
313f89a108 Merge branch 'main' into rolling-traefik-restarts
* main:
  Removed not needed MRSK.traefik.run command in Traefil reboot
  Updated README with locking directory name
  Include service name to lock details
  Configurable SSH log levels
  Add registry container output to debug
  Minor tweaks to hooks section in readme
  Update README.md
  Updated README.md to make setup examples consistent
  Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 14:46:16 +01:00
Lewis Buckley
9ab448e186 Support a --rolling option for traefik reboots 2023-07-19 14:39:27 +01:00
Donal McBreen
e1433f3895 Merge pull request #349 from igor-alexandrov/login-to-registry-proactively
Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 13:36:00 +01:00
Donal McBreen
95e3915991 Merge pull request #386 from mrsked/ssh-log-levels
Configurable SSH log levels
2023-07-17 14:09:21 +01:00
Igor Alexandrov
e6ca270537 Include service name to lock details 2023-07-15 21:50:39 +04:00