When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container
Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.
To get round that the new boot process is:
1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container
The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
Setting env variables in the docker arguments requires having them on
the deploy host.
Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.
Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.
Running `kamal env push` will create env files for each role and
accessory, and traefik if required.
`kamal envify` has been updated to also push the env files.
By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.
One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.
We maybe need to look docker config or docker secrets longer term to get
around this.
Hattip to @kevinmcconnell - this was all his idea.
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.
But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`
The directory is created during bootstrap or before any command that
will need to access a file.
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.
To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.
Fixes: https://github.com/mrsked/mrsk/issues/402
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
Useful for checking the status of CI before deploying. Doing this at
this point in the deployment maximises the parallelisation of building
and running CI.
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.
The audit broadcast after booting an accessory has been removed.
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.
We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.
Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.
Hooks returning non-zero exit codes will abort the current command.
Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.
Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
Audit details
* Audit logs and broadcasts accept `details` whose values are included as log tags and MRSK_* env vars passed to the broadcast command
* Commands may return execution options to the CLI in their args list
* Introduce `mrsk broadcast` helper for sending audit broadcasts
* Report UTC time, not local time, in audit logs. Standardize on ISO 8601 format
* main:
Simplify domain language to just "boot" and unscoped config keys
Retain a fixed number of containers when pruning
Don't assume rolling back in message
Check all hosts before rolling back
Ensure Traefik service name is consistent
Extend traefik delay by 1 second
Include traefik access logs
Check if we are still getting a 404
Also dump load balancer logs
Dump traefik logs when app not booted
Fix missing for apt-get
Report on container health after failure
Fix the integration test healthcheck
Allow percentage-based rolling deployments
Move `group_limit` & `group_wait` under `boot`
Limit rolling deployment to boot operation
Allow performing boot & start operations in groups
Hosts could end up out of sync with each other if prune commands are run
manually or when new hosts are added.
Before rolling back confirm that the required container is available on
all hosts and roles.
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.
Fixes https://github.com/mrsked/mrsk/issues/239
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.
We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
* main: (24 commits)
Bump version for 0.11.0
Labels can be added to Traefik
Make rollbacks role-aware
fix typo role to roles
Explained the latest modifications of Traefik container labels
Remove .idea folder
Updated README.md with new healthcheck.max_attempts option
Fix test case: console output message was not updated to display the current/total attempts
Require net-ssh ~> 7.0 for SHA-2 support
Improved deploy lock acquisition
Excess CR
Style
Simpler
Make it explicit, focus on Ubuntu
More explicit
Not that --bundle is a Rails 7+ option
Update README.md
Update README.md
Improved: configurable max_attempts for healthcheck
Traefik service name to be derived from role and destination
...
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.
We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
1. Don't raise lock error for non-lock issues during lock acquire
(see https://github.com/mrsked/mrsk/pull/181)
2. If there is an error while the lock is held, don't release the lock
and send a warning to stderr
Allow the hosts for accessories to be specified by host or role, or on
all app hosts by setting `daemon: true`.
```
# Single host
mysql:
host: 1.1.1.1
# Multiple hosts
redis:
hosts:
- 1.1.1.1
- 1.1.1.2
# By role
monitoring:
roles:
- web
- jobs
```
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.
We can then do the normal zero downtime run/wait/stop.
While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
* main: (32 commits)
Inline default as with other options
Symbols!
Fix tests
test stop with custom stop wait time
No need to replicate Docker default
Describe purpose rather than elements
Style and ordering
Customizable stop wait time
Fix tests
Ensure it also works when configuring just log options without setting a driver
Add accessory test
Undo change
Improve test
Update README
Ensure default log option `max-size=10m`
#142 Allow to customize container options in accessories
Fix flaky test
Fix tests
More resilient tests
Fix other tests
...
* main:
Wording
Remove accessory images using tags rather than labels
Update readme to point to ghcr.io/mrsked/mrsk
Validate that all roles have hosts
Commander needn't accumulate configuration
Pull latest image tag, so we can identity it
Default to deploying the config version
Remove unneeded Dockerfile.dind, update Readme
add D-in-D dockerfile, update Readme
Add a deploy lock for commands that are unsafe to run concurrently.
The lock is taken by creating a `mrsk_lock` directory on the primary
host. Details of who took the lock are added to a details file in that
directory.
Additional CLI commands have been added to manual release and acquire
the lock and to check its status.
```
Commands:
mrsk lock acquire -m, --message=MESSAGE # Acquire the deploy lock
mrsk lock help [COMMAND] # Describe subcommands or one specific subcommand
mrsk lock release # Release the deploy lock
mrsk lock status # Report lock status
Options:
-v, [--verbose], [--no-verbose] # Detailed logging
-q, [--quiet], [--no-quiet] # Minimal logging
[--version=VERSION] # Run commands against a specific app version
-p, [--primary], [--no-primary] # Run commands only on primary host instead of all
-h, [--hosts=HOSTS] # Run commands on these hosts instead of all (separate by comma)
-r, [--roles=ROLES] # Run commands on these roles instead of all (separate by comma)
-c, [--config-file=CONFIG_FILE] # Path to config file
# Default: config/deploy.yml
-d, [--destination=DESTINATION] # Specify destination to be used for config file (staging -> deploy.staging.yml)
-B, [--skip-broadcast], [--no-skip-broadcast] # Skip audit broadcasts
```
If we add support for running multiple deployments on a single server
we'll need to extend the locking to lock per deployment.
Commander had version/destination solely to incrementally accumulate CLI
options. Simpler to configure in one shot.
Clarifies responsibility and lets us introduce things like
`abbreviated_version` in one spot - Configuration.
* main:
Ask for access token
Style
Style
config.traefik is already nil safe
Update README.md
Bump dev deps and consolidate platform matches
Deploys mention the released service@version
Accessories aren't required to publish a port
Accessories may be pulled from authenticated registries
Polish destination config loading
Allow arbitrary docker options for traefik
Fixed typos
Fixed readme
Rebased on main
Added volume configuration in response to issue coments
Modified in response to PR comments
Added the additional_ports configuration
Less work for broadcast commands to take on.
Also fixes a bug where rollback on hosts without a running container
would stop the container they had just started.
If we don't supply a version when deploying we'll use the result of
docker image ls to decide which image to boot. But that doesn't
necessarily correspond to the one we have just built.
E.g. if you do something like:
```
mrsk deploy # deploys git sha AAAAAAAAAAAAAA
git commit --amend # update the commit message
mrsk deploy # deploys git sha BBBBBBBBBBBBBB
```
In this case running `docker image ls` will give you the same image
twice (because the contents are identical) with tags for both SHAs but
the image we have just built will not be returned first. Maybe the order
is random, but it always seems to come second as far as I have seen.
i.e you'll get something like:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
foo/bar AAAAAAAAAAAAAA 6272349a9619 31 minutes ago 791MB
foo/bar BBBBBBBBBBBBBB 6272349a9619 31 minutes ago 791MB
```
Since we already know what version we want to deploy from the config,
let's just pass that through.