Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).
The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
Starting many (90+) SSH connections has caused us some issues such as
failed DNS lookups and hitting process file descriptor limits.
To mitigate this, patch SSHKit::Backend::Netssh to limit concurrency of
connection starts. We'll default to 30 at a time which seems to work
without issue, but can be configured via:
```
sshkit:
max_concurrent_starts: 10
```
* main:
Removed not needed MRSK.traefik.run command in Traefil reboot
Updated README with locking directory name
Include service name to lock details
Configurable SSH log levels
Add registry container output to debug
Minor tweaks to hooks section in readme
Update README.md
Updated README.md to make setup examples consistent
Login to the registry proactively before stoping Accessory and Traefik
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.
Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
If you different images with the same git SHA, on the second deploy the
tag is moved and the first image becomes untagged. It may however still
be attached to an existing container.
To handle this:
1. Initially prune dangling images - this will remove any untagged
images that are not attached to an existing image
2. Then filter out the untagged images when deleting tagged images - any
that remain will be attached to a container.
The second issue is that `docker container ls -a --format '{{.Image}}`
will sometimes return the image id rather than a tag. This means that
the image doesn't get filtered out when we grep to remove the active
images.
To fix that we'll grep against both the image id and repo:tag.
Useful for checking the status of CI before deploying. Doing this at
this point in the deployment maximises the parallelisation of building
and running CI.
dangling=true doesn't prune any images, as we are not creating dangling
images.
Using --all should remove unused images, but it considers the Git SHA
tag on the latest image to be unused (presumably because there are two
tags, the SHA and latest and the running container is only considered to
be using "latest"). As a result it deletes the tag, which means that we
can't rollback to that SHA later.
Its a bit more complicated to only remove images that are not referenced
by any containers.
First we find the tags we want to keep from the containers (running and
stopped).
Then we append the latest tag to that list.
Then we get a full list of image tags and remove those tags from that
list (using `grep -v -w`).
Finally we pass the tags to `docker rmi`. That either deletes the tag if
there are other references to the image or both the tag and the image if
it is the only one.
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.
The audit broadcast after booting an accessory has been removed.