The env check is not needded anymore as all the commands rely on the
env files having already been created remotely.
The only place the env is needed is when running `kamal env push` and
that will still raise an apropriate error.
If you always want to use a destination, and have a base deploy.yml file
that doesn't specify any hosts, then if you forget to specific the
destination you will get a cryptic error.
Add a "require_destination" setting you can use to avoid this.
During deployments both the old and new containers will be active for a
small period of time. There also may be lagging requests for older CSS
and JS after the deployment.
This can lead to 404s if a request for old assets hits a new container
or visa-versa.
This PR makes sure that both sets of assets are available throughout the
deployment from before the new version of the app is booted.
This can be configured by setting the asset path:
```yaml
asset_path: "/rails/public/assets"
```
The process is:
1. We extract the assets out of the container, with docker run, docker
cp, docker stop. Docker run sets the container command to "sleep" so
this needs to be available in the container.
2. We create an asset volume directory on the host for the new version
of the app on the host and copy the assets in there.
3. If there is a previous deployment we also copy the new assets into
its asset volume and copy the older assets into the new asset volume.
4. We start the new container mapping the asset volume over the top of
the container's asset path.
This means the both the old and new versions have replaced the asset
path with a volume containing both sets of assets and should be able
to serve any request during the deployment. The older assets will
continue to be available until the next deployment.
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container
Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.
To get round that the new boot process is:
1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container
The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
Setting env variables in the docker arguments requires having them on
the deploy host.
Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.
Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.
Running `kamal env push` will create env files for each role and
accessory, and traefik if required.
`kamal envify` has been updated to also push the env files.
By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.
One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.
We maybe need to look docker config or docker secrets longer term to get
around this.
Hattip to @kevinmcconnell - this was all his idea.
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.
But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`
The directory is created during bootstrap or before any command that
will need to access a file.
Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).
The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
Starting many (90+) SSH connections has caused us some issues such as
failed DNS lookups and hitting process file descriptor limits.
To mitigate this, patch SSHKit::Backend::Netssh to limit concurrency of
connection starts. We'll default to 30 at a time which seems to work
without issue, but can be configured via:
```
sshkit:
max_concurrent_starts: 10
```
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.
Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
* `-e [REDACTED]` → `-e SOME_SECRET=[REDACTED]`
* Replaces `Utils.redact` with `Utils.sensitive` to clarify that we're
indicating redactability, not actually performing redaction.
* Redacts from YAML output, including `mrsk config` (fixes#96)
* main:
Wording
Remove accessory images using tags rather than labels
Update readme to point to ghcr.io/mrsked/mrsk
Validate that all roles have hosts
Commander needn't accumulate configuration
Pull latest image tag, so we can identity it
Default to deploying the config version
Remove unneeded Dockerfile.dind, update Readme
add D-in-D dockerfile, update Readme