Commit Graph

34 Commits

Author SHA1 Message Date
Donal McBreen
8a41d15b69 Zero downtime deployment with cord file
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container

Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.

To get round that the new boot process is:

1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container

The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
2023-09-06 14:35:30 +01:00
Donal McBreen
94bf090657 Copy env files to remote hosts
Setting env variables in the docker arguments requires having them on
the deploy host.

Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.

Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.

Running `kamal env push` will create env files for each role and
accessory, and traefik if required.

`kamal envify` has been updated to also push the env files.

By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.

One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.

We maybe need to look docker config or docker secrets longer term to get
around this.

Hattip to @kevinmcconnell - this was all his idea.
2023-09-06 14:33:13 +01:00
Krzysztof Adamski
c2b2f7ea33 Fixing Tests 2023-09-06 10:16:59 +02:00
Donal McBreen
d0fbf538d3 Add integration test hooks back in 2023-08-23 07:36:48 +01:00
David Heinemeier Hansson
d981c3c968 Move hooks 2023-08-22 12:47:00 -07:00
David Heinemeier Hansson
c4a203e648 Rename to Kamal 2023-08-22 08:24:31 -07:00
Donal McBreen
1163c3de07 Configurable log levels
Allow ssh log_level to be set - this will help to debug connection
issues.
2023-08-15 16:51:56 +01:00
Donal McBreen
c2d7fd775f Don't hide Traefik errors
When stopping or starting Traefik, don't hide important errors.

Docker doesn't return an error when starting a started container or
stopping a stopped container.

When rebooting we want to know about errors during run as we've just
stopped and removed the previous container.

When booting, we want to leave the running container if it exists,
restart a stopped container and run a new one if none exists.

We can implement this with `docker start ... || docker run ...`:
- if the container is started, `docker start` will exit with 0
- if the container is stopped, `docker start` will start it and exit with 0
- if the container doesn't exist, `docker start` will return a non zero
exit code and `docker run` will create a new container. Any errors in
`docker run` will be returned.
2023-08-08 15:41:16 +01:00
Donal McBreen
f64b596907 Prevent SSH connection restarts
Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).

The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
2023-07-25 13:09:46 +01:00
Donal McBreen
da1c049829 Add registry container output to debug 2023-07-14 13:41:30 +01:00
Donal McBreen
a14c6141e5 Dump container logs on failure 2023-06-15 12:05:50 +01:00
Donal McBreen
95d6ee5031 Remove /root/.ssh before symlinking
Ensure the symlinks are created correctly whether or not /root/.ssh
already exists.
2023-06-15 12:02:56 +01:00
Donal McBreen
db0bf6bb16 Add a pre-deploy hook
Useful for checking the status of CI before deploying. Doing this at
this point in the deployment maximises the parallelisation of building
and running CI.
2023-05-29 16:06:41 +01:00
Donal McBreen
1e300f3798 Wait longer for app to come up 2023-05-29 08:31:19 +01:00
Donal McBreen
9037088f99 Increase nginx timeouts in load balancer 2023-05-25 17:31:20 +01:00
Donal McBreen
66f9ce0e90 Add a pre-connect hook
This can be used for hooks that should run before connecting to remote
hosts. An example use case is pre-warming DNS.
2023-05-24 14:39:30 +01:00
Donal McBreen
cc2b321d93 Combine post-deploy and post-rollback 2023-05-23 13:57:24 +01:00
Donal McBreen
9fd184dc32 Add post-deploy and post-rollback hooks
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.

The audit broadcast after booting an accessory has been removed.
2023-05-23 13:56:16 +01:00
Donal McBreen
38023fe538 Remove post push hook 2023-05-23 13:55:05 +01:00
Donal McBreen
0bc1fbfb74 Set max-concurrent-downloads to 1 to prevent timeouts 2023-05-23 13:55:05 +01:00
Donal McBreen
f3ec9f19c8 Add debug for failed version checks 2023-05-23 13:55:04 +01:00
Donal McBreen
58c1096a90 MRSK hooks
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.

We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.

Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.

Hooks returning non-zero exit codes will abort the current command.

Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
2023-05-23 13:55:04 +01:00
Donal McBreen
7cd25fd163 Add more integration tests
Add tests for main, app, accessory, traefik and lock commands.
Other commands are generally covered by the main tests.

Also adds some changes to speed up the integration specs:
- Use a persistent volume for the registry so we can push images to to
reuse between runs (also gets around docker hub rate limits)
- Use persistent volume for mrsk gem install, to avoid re-installing
between tests
- Shorter stop wait time
- Shorter connection timeouts on the load balancer

Takes just over 2 minutes to run all tests locally on an M1 Mac
after docker caches are primed.
2023-05-16 10:35:35 +01:00
Donal McBreen
a5ef1f254f Highlight uncommitted changes in version
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.

Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
2023-05-12 11:08:48 +01:00
Donal McBreen
326711a3e0 Fix aggressive prune breaking rollback
In the image prune command --all overrides --dangling=true. This removes
the image git sha image tag for the latest image which prevented
us from rolling back to it.

I've updated the integration test to now test deploy, redeploy and
rollback.
2023-05-05 12:13:14 +01:00
Donal McBreen
650f9b1fbf Include traefik access logs 2023-05-01 18:55:10 +01:00
Donal McBreen
1170e2311e Check if we are still getting a 404 2023-05-01 18:32:07 +01:00
Donal McBreen
94f87edded Also dump load balancer logs 2023-05-01 18:27:08 +01:00
Donal McBreen
548a1019c1 Dump traefik logs when app not booted 2023-05-01 18:21:22 +01:00
Donal McBreen
ca2e2bac2e Fix missing for apt-get 2023-05-01 12:50:45 +01:00
Donal McBreen
494a1ae089 Report on container health after failure 2023-05-01 12:13:12 +01:00
Donal McBreen
a77428143f Fix the integration test healthcheck
The alpine nginx container doesn't contain curl, so let's override the
healthcheck command to use wget.
2023-05-01 12:11:24 +01:00
Donal McBreen
52ca5b846a Wait for healthy containers in integration test
Rather than waiting 5 seconds and hoping for the best after we boot
docker compose, add docker healthchecks and wait for all the containers
to be healthy.
2023-04-25 15:41:25 +01:00
Donal McBreen
bcf8a927f5 Run a mrsk deploy integration test
Adds a simple integration test to ensure that `mrsk deploy` works.

Everything required is spun up with docker compose:
- shared: a container that contains an ssh key and a self signed cert to
be shared between the images
- deployer: the image we will deploy from
- registry: a docker registry
- two vm images to deploy into
- load_balancer: an nginx load balancer to use between our images

The other images are in privileged mode so that we can run
docker-in-docker. We need to run docker inside the images - mapping in
the docker socket doesn't work because both VMs would share the host
daemon.

The docker registry requires a self signed cert as you cannot use basic
auth over HTTP except on localhost. It runs on port 4443 rather than 443
because docker refused to accept that "registry" is a docker host and
tries to push images to docker.io/registry. "registry:4443" works fine.

The shared container contains the ssh keys for the deployer and vms, and
the self signed cert for the registry. When the shared container boots,
it copies them into a shared volume.

The other deployer and vm images are built with soft links from the
shared volume to the require locations. Their boot scripts wait for the
files to be copied in before continuing.

The root mrsk folder is mapped into the deployer container. On boot it
builds the gem and installs it.

Right now there's just a single test. We confirm that the load balancer
is returning a 502, run `mrsk deploy` and then confirm it returns 200.
2023-04-14 15:49:43 +01:00