Commit Graph

5 Commits

Author SHA1 Message Date
Donal McBreen
6568cef868 Replace Traefik with kamal-proxy
[kamal-proxy](https://github.com/basecamp/kamal-proxy) is a custom
minimal proxy designed specifically for Kamal.

It has some advantages over Traefik:
1. Imperative deployments - we tell it to switch from container A to
   container B, and it waits for container B to start then switches. No
   need to poll for health checks ourselves or mess around with forcing
   health checks to fail.
2. Support for multiple apps - as much as possible, configuration is
   supplied at runtime by the deploy command, allowing us to have
   multiple apps share a proxy without conflicting config.
3. First class support for Kamal operations - rather than trying to
   work out how to make Traefik do what we want, we can build features
   directly into the proxy, making configuration simpler and avoiding
   obscure errors
2024-05-28 15:31:56 +01:00
Donal McBreen
706b82baa1 Simplify messages and remove multiple execute error 2024-05-21 10:40:01 +01:00
Donal McBreen
ee758d951a Only use barrier when needed, more descriptive info 2024-05-20 12:18:30 +01:00
Donal McBreen
773ba3a5ab Show container logs and healthcheck status on failure 2024-05-20 12:18:30 +01:00
Donal McBreen
0efb5ccfff Remove the healthcheck step
To speed up deployments, we'll remove the healthcheck step.

This adds some risk to deployments for non-web roles - if they don't
have a Docker healthcheck configured then the only check we do is if
the container is running.

If there is a bad image we might see the container running before it
exits and deploy it. Previously the healthcheck step would have avoided
this by ensuring a web container could boot and serve traffic first.

To mitigate this, we'll add a deployment barrier. Until one of the
primary role containers passes its healthcheck, we'll keep the barrier
up and avoid stopping the containers on the non-primary roles.

It the primary role container fails its healthcheck, we'll close the
barrier and shut down the new containers on the waiting roles.

We also have a new integration test to check we correctly handle a
a broken image. This highlighted that SSHKit's default runner will
stop at the first error it encounters. We'll now have a custom runner
that waits for all threads to finish allowing them to clean up.
2024-05-20 12:18:30 +01:00