Commit Graph

673 Commits

Author SHA1 Message Date
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
Donal McBreen
adc7173cf2 Merge pull request #437 from basecamp/kamal-run-directory
Configurable Kamal directory
2023-09-06 14:31:07 +01:00
Krzysztof Adamski
bbcc90e4d1 Configurable Healthcheck Expose Port 2023-09-05 10:53:32 +02:00
Donal McBreen
787688ea08 kamal -> .kamal 2023-08-28 17:13:52 +01:00
Donal McBreen
bcfa1d83e8 Configurable Kamal directory
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.

But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`

The directory is created during bootstrap or before any command that
will need to access a file.
2023-08-28 16:32:18 +01:00
David Heinemeier Hansson
9363b6a464 Bump version for 0.16.1 2023-08-24 09:16:13 -07:00
David Heinemeier Hansson
338fd4e493 Merge pull request #428 from tbuehlmann/main
Fix picking the first available role on primary_host
2023-08-24 08:36:29 -07:00
Tobias Bühlmann
556f7f5a37 Fix picking the first available role on primary_host 2023-08-24 13:50:24 +02:00
Trevor Vallender
c2ec04f8c1 Allow Traefik to run without publishing port
Adds the `publish` option which, if set to false, does not pass `--publish` to
`docker run` when starting Traefik. This is useful when running Traefik
behind a reverse proxy, for example.
2023-08-24 10:52:10 +01:00
fig
1ab7405e36 require ActiveSupport module to provide String#remove
fixes #421
2023-08-23 15:17:26 +01:00
David Heinemeier Hansson
ac11089c7a Bump version for 0.16.0 2023-08-22 11:42:32 -07:00
David Heinemeier Hansson
dc1421a1fc Correct casing 2023-08-22 09:22:32 -07:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
e2c3709d74 Merge pull request #417 from manastyretskyi/main
Fix builder registry cache when using default registry
2023-08-17 14:08:05 +01:00
Liubomyr Manastyretskyi
f68a33465f Fix review comments 2023-08-17 11:58:14 +03:00
Donal McBreen
1163c3de07 Configurable log levels
Allow ssh log_level to be set - this will help to debug connection
issues.
2023-08-15 16:51:56 +01:00
Donal McBreen
715cd94bbf Merge pull request #413 from mrsked/extract-version-from-container-name-correctly
Extract versions that contains dashes
2023-08-15 15:11:03 +01:00
Donal McBreen
dda7099b2f Merge pull request #414 from mrsked/traefik-start-stop-run-errors
Don't hide Traefik errors
2023-08-15 15:10:47 +01:00
Liubomyr Manastyretskyi
6774675547 Fix builder registry cache when using default registry 2023-08-13 12:04:03 +03:00
Igor Alexandrov
c24c7abb79 Fix for https://github.com/mrsked/mrsk/issues/407 2023-08-08 19:04:35 +04:00
Donal McBreen
c2d7fd775f Don't hide Traefik errors
When stopping or starting Traefik, don't hide important errors.

Docker doesn't return an error when starting a started container or
stopping a stopped container.

When rebooting we want to know about errors during run as we've just
stopped and removed the previous container.

When booting, we want to leave the running container if it exists,
restart a stopped container and run a new one if none exists.

We can implement this with `docker start ... || docker run ...`:
- if the container is started, `docker start` will exit with 0
- if the container is stopped, `docker start` will start it and exit with 0
- if the container doesn't exist, `docker start` will return a non zero
exit code and `docker run` will create a new container. Any errors in
`docker run` will be returned.
2023-08-08 15:41:16 +01:00
Donal McBreen
4dd8208290 Extract versions that contains dashes
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.

To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.

Fixes: https://github.com/mrsked/mrsk/issues/402
2023-08-08 14:16:32 +01:00
Donal McBreen
aa89ededde Merge pull request #399 from mrsked/manage-ssh-connection-starts
Manage SSH connection starts
2023-08-07 14:37:34 +01:00
David Heinemeier Hansson
299b166db7 Merge pull request #389 from brunoprietog/include-role-options-when-executing-commands
Include role options when executing commands
2023-07-26 14:04:28 +02:00
Donal McBreen
94d6a763a8 Extract ssh and sshkit configuration 2023-07-26 12:26:23 +01:00
Donal McBreen
752ff53458 Merge pull request #396 from igor-alexandrov/track-uncommitted-changes
Log uncommitted changes during deploy
2023-07-25 14:35:44 +01:00
Donal McBreen
f64b596907 Prevent SSH connection restarts
Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).

The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
2023-07-25 13:09:46 +01:00
Donal McBreen
b25cfa178b Limit SSH start concurrency
Starting many (90+) SSH connections has caused us some issues such as
failed DNS lookups and hitting process file descriptor limits.

To mitigate this, patch SSHKit::Backend::Netssh to limit concurrency of
connection starts. We'll default to 30 at a time which seems to work
without issue, but can be configured via:

```
sshkit:
  max_concurrent_starts: 10
```
2023-07-25 13:08:44 +01:00
Donal McBreen
edcfc77d95 Bump version for 0.15.1 2023-07-25 13:07:04 +01:00
Donal McBreen
2daaf442fa Revert "Configurable SSH log levels" 2023-07-25 12:53:45 +01:00
Igor Alexandrov
d414253393 Updated uncommitted notification text 2023-07-24 20:12:22 +04:00
Bruno Prieto
cbd180205d Include role options when executing commands 2023-07-24 17:45:24 +02:00
Donal McBreen
61b7dc90f2 Bump version for 0.15.0 2023-07-24 13:43:50 +01:00
Igor Alexandrov
ea941f33f9 Moved uncommitted changes message out of run_locally block 2023-07-21 22:45:23 +04:00
Igor Alexandrov
0cfafd1d25 Log uncommitted changes during deploy 2023-07-21 18:37:45 +04:00
Lewis Buckley
9d5a6d1321 Document the rolling option for traefik reboots 2023-07-19 15:03:15 +01:00
Lewis Buckley
ecfd258093 Document the rolling reboot option 2023-07-19 14:58:46 +01:00
Lewis Buckley
313f89a108 Merge branch 'main' into rolling-traefik-restarts
* main:
  Removed not needed MRSK.traefik.run command in Traefil reboot
  Updated README with locking directory name
  Include service name to lock details
  Configurable SSH log levels
  Add registry container output to debug
  Minor tweaks to hooks section in readme
  Update README.md
  Updated README.md to make setup examples consistent
  Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 14:46:16 +01:00
Lewis Buckley
9ab448e186 Support a --rolling option for traefik reboots 2023-07-19 14:39:27 +01:00
Donal McBreen
e1433f3895 Merge pull request #349 from igor-alexandrov/login-to-registry-proactively
Login to the registry proactively before stoping Accessory and Traefik
2023-07-19 13:36:00 +01:00
Igor Alexandrov
a29e188c90 Removed not needed MRSK.traefik.run command in Traefil reboot 2023-07-19 15:08:42 +04:00
Donal McBreen
95e3915991 Merge pull request #386 from mrsked/ssh-log-levels
Configurable SSH log levels
2023-07-17 14:09:21 +01:00
Igor Alexandrov
e6ca270537 Include service name to lock details 2023-07-15 21:50:39 +04:00
Donal McBreen
cd88c49c42 Configurable SSH log levels
Allow ssh log_level to be set to debug connection issues.
2023-07-14 16:08:47 +01:00
Igor Alexandrov
2746a48e88 Login to the registry proactively before stoping Accessory and Traefik 2023-06-22 15:13:47 +04:00
David Heinemeier Hansson
9a501867b4 Bump version for 0.14.0 2023-06-20 17:02:42 +02:00
Donal McBreen
4950f61a87 Only require secrets when mutating
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
2023-06-20 15:39:51 +01:00
David Heinemeier Hansson
08d8790851 Merge pull request #337 from igor-alexandrov/feature/cache
Support for Docker multistage build cache
2023-06-20 11:38:46 +02:00
Igor Alexandrov
02256ac8fe More code style improvements 2023-06-19 18:22:07 +04:00