Fixes:
```
ERROR (Net::SSH::Exception): Exception while executing on host example.com: could not settle on kex algorithm
Server kex preferences: curve25519-sha256@libssh.org,ext-info-s,kex-strict-s-v00@openssh.com
Client kex preferences: ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1```
add x25519 in Gemfile.lock
- Add local logout to `kamal registry logout`
- Add `skip_local` and `skip_remote` options to `kamal registry` commands
- Skip local login in `kamal deploy` when `--skip-push` is used
Load the hosts from the contexts before trying to build.
If there is no context, we'll create one. If there is one but the hosts
don't match we'll re-create.
Where we just have a local context, there won't be any hosts but we
still inspect the builder to check that it exists.
Validate the Kamal configuration giving useful warning on errors.
Each section of the configuration has its own config class and a YAML
file containing documented example configuration.
You can run `kamal docs` to see the example configuration, and
`kamal docs <section>` to see the example configuration for a specific
section.
The validation matches the configuration to the example configuration
checking that there are no unknown keys and that the values are of
matching types.
Where there is more complex validation - e.g for envs and servers, we
have custom validators that implement those rules.
Additonally the configuration examples are used to generate the
configuration documentation in the kamal-site repo.
You generate them by running:
```
bundle exec bin/docs <kamal-site-checkout>
```
When cloning the git repo:
1. Try to clone
2. If there's already a build directory reset it
3. Check the clone is valid
If anything goes wrong during that process:
1. Delete the clone directory
2. Clone it again
3. Check the clone is valid
Raise any errors after that
Using a different SSHKit runner doesn't work well, because the group
runner uses the Parallel runner internally. So instead we'll patch its
behaviour to fail slow.
We'll also get it to return all the errors so we can report on all the
hosts that failed.
If a primary role container is unhealthy, we might take a while to
timeout the health check poller. In the meantime if we have started the
other roles, they'll be running tow containers.
This could be a problem, especially if they read run jobs as that
doubles the worker capacity which could cause exessive load.
We'll wait for the first primary role container to boot successfully
before starting the other containers from other roles.
To speed up deployments, we'll remove the healthcheck step.
This adds some risk to deployments for non-web roles - if they don't
have a Docker healthcheck configured then the only check we do is if
the container is running.
If there is a bad image we might see the container running before it
exits and deploy it. Previously the healthcheck step would have avoided
this by ensuring a web container could boot and serve traffic first.
To mitigate this, we'll add a deployment barrier. Until one of the
primary role containers passes its healthcheck, we'll keep the barrier
up and avoid stopping the containers on the non-primary roles.
It the primary role container fails its healthcheck, we'll close the
barrier and shut down the new containers on the waiting roles.
We also have a new integration test to check we correctly handle a
a broken image. This highlighted that SSHKit's default runner will
stop at the first error it encounters. We'll now have a custom runner
that waits for all threads to finish allowing them to clean up.
This includes a fix for a bug in the eviction thread that could cause
this error:
```
[ERROR (IOError): Exception while executing on host foo: closed stream]
```
See https://github.com/capistrano/sshkit/pull/534