1. Update to kamal-proxy 0.4.0 which creates and chowns
/home/kamal-proxy/.config/kamal-proxy to kamal-proxy
2. Use a docker volume rather than mapping in a directory, so docker
keeps it owned by the correct user
The proxy can be enabled via the config:
```
proxy:
enabled: true
hosts:
- 10.0.0.1
- 10.0.0.2
```
This will enable the proxy and cause it to be run on the hosts listed
under `hosts`, after running `kamal proxy reboot`.
Enabling the proxy disables `kamal traefik` commands and replaces them
with `kamal proxy` ones. However only the marked hosts will run the
kamal-proxy container, the rest will run Traefik as before.
Validate the Kamal configuration giving useful warning on errors.
Each section of the configuration has its own config class and a YAML
file containing documented example configuration.
You can run `kamal docs` to see the example configuration, and
`kamal docs <section>` to see the example configuration for a specific
section.
The validation matches the configuration to the example configuration
checking that there are no unknown keys and that the values are of
matching types.
Where there is more complex validation - e.g for envs and servers, we
have custom validators that implement those rules.
Additonally the configuration examples are used to generate the
configuration documentation in the kamal-site repo.
You generate them by running:
```
bundle exec bin/docs <kamal-site-checkout>
```
To speed up deployments, we'll remove the healthcheck step.
This adds some risk to deployments for non-web roles - if they don't
have a Docker healthcheck configured then the only check we do is if
the container is running.
If there is a bad image we might see the container running before it
exits and deploy it. Previously the healthcheck step would have avoided
this by ensuring a web container could boot and serve traffic first.
To mitigate this, we'll add a deployment barrier. Until one of the
primary role containers passes its healthcheck, we'll keep the barrier
up and avoid stopping the containers on the non-primary roles.
It the primary role container fails its healthcheck, we'll close the
barrier and shut down the new containers on the waiting roles.
We also have a new integration test to check we correctly handle a
a broken image. This highlighted that SSHKit's default runner will
stop at the first error it encounters. We'll now have a custom runner
that waits for all threads to finish allowing them to clean up.
If the app container is down or not responding then traefik will return
a 404 response code. This is not ideal as it suggests a client rather
than a server problem.
To fix this, we'll define a catch all route that always returns a 502.
This is not ideal as this route would take priority over a shorter route
with priorty 1.
TODO: up the priority of the app route.
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.
The audit broadcast after booting an accessory has been removed.
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.
We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.
Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.
Hooks returning non-zero exit codes will abort the current command.
Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
Add tests for main, app, accessory, traefik and lock commands.
Other commands are generally covered by the main tests.
Also adds some changes to speed up the integration specs:
- Use a persistent volume for the registry so we can push images to to
reuse between runs (also gets around docker hub rate limits)
- Use persistent volume for mrsk gem install, to avoid re-installing
between tests
- Shorter stop wait time
- Shorter connection timeouts on the load balancer
Takes just over 2 minutes to run all tests locally on an M1 Mac
after docker caches are primed.