Commit Graph

124 Commits

Author SHA1 Message Date
Clint Miller
8dca65f48f Fix commands/app tests 2023-09-20 08:12:27 -05:00
dhh
3ae855ef28 Explain method better 2023-09-16 09:53:03 -07:00
Donal McBreen
3c12d1799c Copy all files into asset volume
Adding -T to the copy command ensures that the files are copied at the
same level into the target directory whether it exists or not.

That allows us to drop the `/*` which was not picking up hidden files.

Fixes: https://github.com/basecamp/kamal/issues/465
2023-09-15 08:07:48 +01:00
Donal McBreen
fb0aeec27e Escape the newline in the inspect query 2023-09-12 19:10:39 +01:00
Donal McBreen
0b439362da Asset paths
During deployments both the old and new containers will be active for a
small period of time. There also may be lagging requests for older CSS
and JS after the deployment.

This can lead to 404s if a request for old assets hits a new container
or visa-versa.

This PR makes sure that both sets of assets are available throughout the
deployment from before the new version of the app is booted.

This can be configured by setting the asset path:

```yaml
asset_path: "/rails/public/assets"
```

The process is:
1. We extract the assets out of the container, with docker run, docker
cp, docker stop. Docker run sets the container command to "sleep" so
this needs to be available in the container.
2. We create an asset volume directory on the host for the new version
of the app on the host and copy the assets in there.
3. If there is a previous deployment we also copy the new assets into
its asset volume and copy the older assets into the new asset volume.
4. We start the new container mapping the asset volume over the top of
the container's asset path.

This means the both the old and new versions have replaced the asset
path with a volume containing both sets of assets and should be able
to serve any request during the deployment. The older assets will
continue to be available until the next deployment.
2023-09-11 12:18:18 +01:00
Donal McBreen
cd02510d0f Output one mount per line
The go template was concatenating all the mounts into one line. It
happened to work because the mount we are interested was always first.

Fix it to output one mount per line instead.
2023-09-07 15:20:50 +01:00
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
4dd8208290 Extract versions that contains dashes
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.

To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.

Fixes: https://github.com/mrsked/mrsk/issues/402
2023-08-08 14:16:32 +01:00
Bruno Prieto
cbd180205d Include role options when executing commands 2023-07-24 17:45:24 +02:00
Matt Robinson
21b13bf8d3 Add support for proxy_command to run_over_ssh 2023-06-16 08:22:10 -03:00
Donal McBreen
28d6a131a9 Prefix container hostname with the underlying one
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.

Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
2023-05-31 16:22:25 +01:00
Donal McBreen
cedb8d900f Stop containers with restarting status
When stopping the old container we need to also look for ones with a
restarting status.
2023-05-25 12:10:26 +01:00
Kevin McConnell
a72f95f44d Ensure Traefik service name is consistent
If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.

That is a problem for two reasons:

- Multiple containers share a common router while a deployment is
  happening. At this point, the router configuration will be different
  between the containers; Traefik flags this as an error, and stops
  routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
  define custom configuration on the service, we'll need to know what
  it will be called.

Changed to force the service name by setting one of its properties.
2023-05-02 09:43:04 +01:00
Kevin McConnell
df202d6ef4 Move health checks into Docker
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
2023-04-13 16:08:43 +01:00
Jacopo
9ddb181f50 Merge branch 'main' into cleanup-excessive-containers-running
* main:
  Pull the primary host from the role
  Minimise holding the deploy lock
2023-04-12 15:19:19 +02:00
Donal McBreen
051556674f Minimise holding the deploy lock
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.

We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
2023-04-12 12:09:56 +01:00
Jacopo
5ed431b807 Merge branch 'main' into cleanup-excessive-containers-running
* main: (24 commits)
  Bump version for 0.11.0
  Labels can be added to Traefik
  Make rollbacks role-aware
  fix typo role to roles
  Explained the latest modifications of Traefik container labels
  Remove .idea folder
  Updated README.md with new healthcheck.max_attempts option
  Fix test case: console output message was not updated to display the current/total attempts
  Require net-ssh ~> 7.0 for SHA-2 support
  Improved deploy lock acquisition
  Excess CR
  Style
  Simpler
  Make it explicit, focus on Ubuntu
  More explicit
  Not that --bundle is a Rails 7+ option
  Update README.md
  Update README.md
  Improved: configurable max_attempts for healthcheck
  Traefik service name to be derived from role and destination
  ...
2023-04-12 11:52:47 +02:00
Jacopo
e980f1164e Avoid using GNU-only Perl Regepx Grep 2023-04-11 08:53:33 +02:00
Jacopo
e2f6db5cae Clear stale containers
By stopping all the older containers with matching /#{service}-#{role}-#{dest}-.*/ running on the same host.
2023-04-11 08:53:33 +02:00
Kartikey Tanna
c60cc92dfe Traefik service name to be derived from role and destination 2023-04-09 13:44:57 +05:30
Donal McBreen
05488e4c1e Zero downtime redeploys
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.

We can then do the normal zero downtime run/wait/stop.

While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
2023-03-24 17:09:20 +00:00
David Heinemeier Hansson
b5ccc1fa5d Merge branch 'main' into global-logging-config 2023-03-24 15:32:41 +01:00
David Heinemeier Hansson
4fa71834ad Symbols! 2023-03-24 15:27:11 +01:00
Samuel Sieg
4044abdde1 Fix tests 2023-03-24 15:25:29 +01:00
Samuel Sieg
bc64a07a95 Merge branch 'main' into global-logging-config 2023-03-24 15:24:06 +01:00
David Heinemeier Hansson
fdb2502216 test stop with custom stop wait time 2023-03-24 15:22:34 +01:00
Jacopo
9b43a6b23b Customizable stop wait time
Configurable via a global `stop_wait_time` option.
The default is `10` which matches Docker defaults.
2023-03-24 15:04:45 +01:00
Samuel Sieg
86e99fb079 Merge branch 'main' into global-logging-config 2023-03-24 14:40:27 +01:00
David Heinemeier Hansson
93423f2f20 Merge branch 'main' into pr/99
* main:
  Wording
  Remove accessory images using tags rather than labels
  Update readme to point to ghcr.io/mrsked/mrsk
  Validate that all roles have hosts
  Commander needn't accumulate configuration
  Pull latest image tag, so we can identity it
  Default to deploying the config version
  Remove unneeded Dockerfile.dind, update Readme
  add D-in-D dockerfile, update Readme
2023-03-24 14:26:31 +01:00
Samuel Sieg
9c27ead21f Ensure it also works when configuring just log options without setting a driver 2023-03-24 09:38:02 +01:00
Samuel Sieg
7369be48ff Ensure default log option max-size=10m 2023-03-24 09:10:36 +01:00
Donal McBreen
1ed4a37da2 Pull latest image tag, so we can identity it
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
2023-03-23 14:39:32 +00:00
David Heinemeier Hansson
e7e3cd98eb Fix tests 2023-03-23 15:16:10 +01:00
David Heinemeier Hansson
19104cafb4 Merge branch 'main' into role-awareness 2023-03-21 08:20:26 -04:00
Samuel Sieg
1bdfc217c4 Merge branch 'main' into global-logging-config 2023-03-21 13:20:12 +01:00
Samuel Sieg
b5372988f7 Add global logging configuration 2023-03-19 09:21:08 +01:00
Samuel Sieg
491777221f Fix destination label filter 2023-03-16 16:15:31 +01:00
David Heinemeier Hansson
cb824bdc42 Merge branch 'main' into role-awareness 2023-03-14 19:11:10 -04:00
Jacopo
50ee954ca9 Fix Traefik retry middleware
As per [Traefik docs](https://doc.traefik.io/traefik/middlewares/overview/#configuration-example)
a middleware to be activated needs to be applied to a route. Change the default settings
to apply the `retry` middleware on every role with Traefik enabled.
2023-03-14 12:15:00 +01:00
Tobias Bühlmann
72e0184e9f Fix failing tests 2023-03-13 17:36:02 +01:00
David Heinemeier Hansson
d2f76dac6b Merge branch 'main' into role-awareness 2023-03-13 15:16:44 +01:00
Richard Taylor
bb241dea43 Add container name env var for containers
Because the container name is generated it isn't possible to
determine this inside the container.

This adds the MRSK_CONTAINER_NAME env var when running the
container so it can be read by the service running inside the
container.
2023-03-11 10:14:41 +00:00
Tobias Bühlmann
418bc13ae7 Apply filters correctly 2023-03-10 10:33:55 +01:00
Tobias Bühlmann
fdb0c8ee91 Rolify app cli/command 2023-03-10 08:50:26 +01:00
David Heinemeier Hansson
1f784176b7 Allow value-less options with true 2023-03-09 11:17:28 +01:00
David Heinemeier Hansson
d3f07d6313 Allow custom options per role 2023-03-09 11:09:19 +01:00
David Heinemeier Hansson
98a14f6173 Add cmd args for roles 2023-03-09 11:01:06 +01:00
David Heinemeier Hansson
371f98d67f Start before stopping and longer timeouts 2023-02-22 19:04:23 +01:00