Otherwise we can't connect to the proxy as the local user and we can't
use ~/.ssh/config to set User directives.
Defaulting to root@ is hard to deprecate without introducing new config.
A clean break is probably clearest.
Dotenv's variable substitution doesn't work the same way as commands run
in the shell. It needs values to be escaped.
```sh
$ cat /tmp/env
SECRETS=$(cat /tmp/json)
SECRETS2=$(echo $SECRETS | jq)
$ cat /tmp/json
\{\ \"foo\"\ :\ \"bar\" \}
$ SECRETS=$(cat /tmp/json)
$ SECRETS2=$(echo $SECRETS | jq)
jq: parse error: Invalid numeric literal at line 1, column 2
$ ruby -e 'require "dotenv"; puts Dotenv.parse("/tmp/env")["SECRETS2"]'
{
"foo": "bar"
}
```
Since you then can't use the shell to debug, `kamal secrets print` will
allow you to see what the secrets will be set to.
Upgrading kamal from `v1.8.3` to `v1.9.0` broke my [kamal playground](https://labs.iximiuz.com/playgrounds/kamal):
```
laborant@dev-machine:~/svc-a$ kamal setup
INFO [34d0def6] Running /usr/bin/env mkdir -p .kamal on 172.16.0.3
INFO [c34cf833] Running /usr/bin/env mkdir -p .kamal on 172.16.0.4
INFO [34d0def6] Finished in 0.147 seconds with exit status 0 (successful).
INFO [c34cf833] Finished in 0.204 seconds with exit status 0 (successful).
Acquiring the deploy lock...
Ensure Docker is installed...
INFO [413ee426] Running docker -v on 172.16.0.4
INFO [f1acacba] Running docker -v on 172.16.0.3
INFO [413ee426] Finished in 0.036 seconds with exit status 0 (successful).
INFO [f1acacba] Finished in 0.076 seconds with exit status 0 (successful).
Log into image registry...
INFO [94cff492] Running docker login registry.iximiuz.com -u [REDACTED] -p [REDACTED] on localhost
INFO [94cff492] Finished in 0.077 seconds with exit status 0 (successful).
INFO [605c535f] Running docker login registry.iximiuz.com -u [REDACTED] -p [REDACTED] on 172.16.0.4
INFO [6002b598] Running docker login registry.iximiuz.com -u [REDACTED] -p [REDACTED] on 172.16.0.3
INFO [605c535f] Finished in 0.083 seconds with exit status 0 (successful).
INFO [6002b598] Finished in 0.083 seconds with exit status 0 (successful).
Build and push app image...
INFO [9d172b1e] Running docker --version && docker buildx version on localhost
INFO [9d172b1e] Finished in 0.059 seconds with exit status 0 (successful).
INFO Cloning repo into build directory `/tmp/kamal-clones/svc-a-2f65914456263/workdir/`...
INFO [26fb1bd3] Running /usr/bin/env git -C /tmp/kamal-clones/svc-a-2f65914456263 clone /workdir --recurse-submodules on localhost
ERROR Error preparing clone: Failed to clone repo: git exit status: 32768
git stdout: Nothing written
git stderr: Cloning into 'workdir'...
fatal: detected dubious ownership in repository at '/workdir/.git'
To add an exception for this directory, call:
git config --global --add safe.directory /workdir/.git
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
, deleting and retrying...
INFO Cloning repo into build directory `/tmp/kamal-clones/svc-a-2f65914456263/workdir/`...
INFO [fd4aac0c] Running /usr/bin/env git -C /tmp/kamal-clones/svc-a-2f65914456263 clone /workdir --recurse-submodules on localhost
Finished all in 0.3 seconds
Releasing the deploy lock...
Finished all in 0.6 seconds
ERROR (SSHKit::Command::Failed): git exit status: 32768
git stdout: Nothing written
git stderr: Cloning into 'workdir'...
fatal: detected dubious ownership in repository at '/workdir/.git'
To add an exception for this directory, call:
git config --global --add safe.directory /workdir/.git
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
laborant@dev-machine:~/svc-a$ kamal version
2.0.0
```
I checked the [v1.8.3...v1.9.0](https://github.com/basecamp/kamal/compare/v1.8.3...v1.9.0) diff, and couldn't find anything even remotely related to the above error.
Then I checked the `git` versions in kamal `v1.8.3` and `v1.9.0` images:
```
docker run -it --rm --entrypoint sh ghcr.io/basecamp/kamal:v1.8.3
/workdir # git --version
git version 2.38.5
```
vs.
```
docker run -it --rm --entrypoint sh ghcr.io/basecamp/kamal:v2.0.0
/workdir # git --version
git version 2.39.5
```
Apparently, something changed in between `2.38.5` and `2.39.5` git releases (likely yet another CVE fix), and the `git config --global --add safe.directory /workdir` stopped working.
Here is the mitigation I currently use, but it's a bit awkward to do it:
```
docker build -t ghcr.io/basecamp/kamal:v2.0.0 - <<EOF
FROM ghcr.io/basecamp/kamal:v2.0.0
RUN git config --global --add safe.directory /workdir/.git
EOF
```
Hence, this PR.
To repro, you can start a [kamal playground](https://labs.iximiuz.com/playgrounds/kamal), then `docker pull ghcr.io/basecamp/kamal:v2.0.0` to override my patched image, and `cd svc-a && kamal setup`.
When fetched item is not a login, Bitwarden adapter raises NoMethodError
because the returned JSON does not have the login.password value.
Add a nicer error message for that case.
Add commands for managing proxy boot config. Since the proxy can be
shared by multiple applications, the configuration doesn't belong in
`config/deploy.yml`.
Instead you can set the config with:
```
Usage:
kamal proxy boot_config <set|get|clear>
Options:
[--publish], [--no-publish], [--skip-publish] # Publish the proxy ports on the host
# Default: true
[--http-port=N] # HTTP port to publish on the host
# Default: 80
[--https-port=N] # HTTPS port to publish on the host
# Default: 443
[--docker-options=option=value option2=value2] # Docker options to pass to the proxy container
```
By default we boot the proxy with `--publish 80:80 --publish 443:443`.
You can stop it from publishing ports, specify different ports and pass
other docker options.
The config is stored in `.kamal/proxy/options` as arguments to be passed
verbatim to docker run.
Where someone wants to set the options in their application they can do
that by calling `kamal proxy boot_config set` in a pre-deploy hook.
There's an example in the integration tests showing how to use this to
front kamal-proxy with Traefik, using an accessory.
Use localhost for app_with_roles and 127.0.0.1 for app. Confirm we can
deploy both and the respond to requests. Ensure the proxy is removed
once both have been removed.
1. Update to kamal-proxy 0.4.0 which creates and chowns
/home/kamal-proxy/.config/kamal-proxy to kamal-proxy
2. Use a docker volume rather than mapping in a directory, so docker
keeps it owned by the correct user
By default only the primary role runs the proxy. To disable the proxy
for that role, you can set `proxy: false` under it.
For other roles they default to not running the proxy, but you can
enable it by setting `proxy: true` for the role, or alternatively
setting a proxy configuration.
The proxy configuration will be merged into the root proxy configuration.
Remove `stop_wait_time` and `readiness_timeout` from the root config
and remove `deploy_timeout` and `drain_timeout` from the proxy config.
Instead we'll just have `deploy_timeout` and `drain_timeout` in the
root config.
For roles that run the proxy, they are passed to the kamal-proxy deploy
command. Once that returns we can assume the container is ready to
shut down.
For other roles, we'll use the `deploy_timeout` when polling the
container to see if it is ready and the `drain_timeout` when stopping
the container.
Adds:
- `kamal upgrade` to upgrade all app hosts and accessory hosts
- `kamal proxy upgrade` to upgrade the proxy on all hosts
- `kamal accessory upgrade [name]` to upgrade accessories on all hosts
Upgrade takes rolling and confirmed options and calls `proxy upgrade`
and `accessory upgrade` in turn.
To just upgrade a single host add -h [host] to the command. But the
upgrade should run on all hosts, not just those running the proxy.
Calling upgrade on a host that has already been upgraded should work ok.
Upgrading hosts causes downtime but you can avoid if you run multiple
hosts by:
1. Implementing the pre-proxy-reboot and post-proxy-reboot hooks to
remove the host from external load balancers
2. Running the upgrade with the --rolling option
**kamal proxy upgrade**
1. Creates a `kamal` network if required
2. Stops and removes the old proxy (whether Traefik or kamal-proxy)
3. Starts a kamal-proxy container in the `kamal` network
4. Reboots the app containers in the `kamal` network
**kamal accessory upgrade [name]**
1. Creates a `kamal` network if required
2. Reboots the accessory containers in the `kamal` network
A matching `downgrade` command will be added to Kamal 1.9.
The proxy can be enabled via the config:
```
proxy:
enabled: true
hosts:
- 10.0.0.1
- 10.0.0.2
```
This will enable the proxy and cause it to be run on the hosts listed
under `hosts`, after running `kamal proxy reboot`.
Enabling the proxy disables `kamal traefik` commands and replaces them
with `kamal proxy` ones. However only the marked hosts will run the
kamal-proxy container, the rest will run Traefik as before.
When overriding the command, docker will still run the entrypoint. We
want to avoid that here - we just want to get the assets out as quickly
as possible. Otherwise maybe something important is going on when we
stop the container.
Add a shared secrets file used across all destinations. Useful for
things Github tokens or registry passwords.
The secrets are added to a new file called `secrets-common` to highlight
they are shared, and to avoid acciedentally inheriting a secret from the
`secrets` file to `secrets.destination`.
Secrets should be interpolated at runtime so we do want the file in git.
But add a warning at the top to avoid adding secrets or git ignore the
file if you do.
Also provide examples of the three options for interpolating secrets.
Avoid outputting this login error message, it wasn't an error and you
don't need to follow those instructions.
```
[ERROR] 2024/09/11 11:57:08 You are not currently signed in. Please run `op signin --help` for instructions
```
Rather than redirecting the global $stdout, which is not never clever in
a threaded program, we'll make the secrets commands aware they are
being inlined, so they return the value instead of printing it.
Additionally we no longer need to interrupt the parent process on error
as we've inlined the command - exit 1 is enough.
Add env files back in for secrets - hides them from process lists and
allows you to pick up the latest env file when running
`kamal app exec` without reusing.
By default look for the env file in .kamal/env to avoid clashes with
other tools using .env.
For now we'll still load .env and issue a deprecation warning, but in
future we'll stop reading those.
1. Add driver as an option, defaulting to `docker-container`. For a
"native" build you can set it to `docker`
2. Set arch as a array of architectures to build for, defaulting to
`[ "amd64", "arm64" ]` unless you are using the docker driver in
which case we default to not setting a platform
3. Remote is now just a connection string for the remote builder
4. If remote is set, we only use it for non-local arches, if we are
only building for the local arch, we'll ignore it.
Examples:
On arm64, build for arm64 locally, amd64 remotely or
On amd64, build for amd64 locally, arm64 remotely:
```yaml
builder:
remote: ssh://docker@docker-builder
```
On arm64, build amd64 on remote,
On amd64 build locally:
```yaml
builder:
arch:
- amd64
remote:
host: ssh://docker@docker-builder
```
Build amd64 on local:
```yaml
builder:
arch:
- amd64
```
Use docker driver, building for local arch:
```yaml
builder:
driver: docker
```
Include the host name in the builder name, so we can have one builder
per host/arch across all kamal projects.
Inherit from the remote builder. The difference in the hybrid builder
is that we create a local buildx instance and append the remote context
to it.
It's just a remote builder, that will build whichever platform is asked
for, so let's remove the "native" part.
We'll also remove the service name from the builder name, so multiple
services can share the same builder.
Combine the two builders, as they are almost identical. The only
difference was whether the platforms were set.
The native cached builder wasn't using the context it created, so now
we do.
We'll set the driver to `docker-container` - it seems to be the default
but the Docker docs claim it is `docker`.
If you can have an alias like:
```
aliases:
rails: app exec -p rails
```
Then `kamal rails db:migrate:status` will execute
`kamal app exec -p rails db:migrate:status`.
So this works, we'll allow multiple arguments `app exec` and
`server exec` to accept multiple arguments.
The arguments are combined by simply joining them with a space. This
means that these are equivalent:
```
kamal app exec -p rails db:migrate:status
kamal app exec -p "rails db:migrate:status"
```
If you want to pass an argument with spaces, you'll need to quote it:
```
kamal app exec -p "git commit -am \"My comment\""
kamal app exec -p git commit -am "\"My comment\""
```
Aliases are defined in the configuration file under the `aliases` key.
The configuration is a map of alias name to command. When we run the
command the we just do a literal replacement of the alias with the
string.
So if we have:
```yaml
aliases:
console: app exec -r console -i --reuse "rails console"
```
Then running `kamal console -r workers` will run the command
```sh
$ kamal app exec -r console -i --reuse "rails console" -r workers
```
Because of the order Thor parses the arguments, this allows us to
override the role from the alias command.
There might be cases where we need to munge the command a bit more but
that would involve getting into Thor command parsing internals,
which are complicated and possibly subject to change.
There's a chance that your aliases could conflict with future built-in
commands, but there's not likely to be many of those and if it happens
you'll get a validation error when you upgrade.
Thanks to @dhnaranjo for the idea!
The integrations tests use their own registry so avoid hitting docker
hub rate limits.
This was using a self signed certificate but instead use
`--insecure-registry` to let the docker daemon use HTTP.
Allow blocks prefixed with `x-` in the configuration as a place to
declare reusable blocks with YAML anchors and aliases.
Borrowed from the Docker Compose configuration file format -
https://github.com/compose-spec/compose-spec/blob/main/spec.md#extension
Thanks to @ruyrocha for the suggestion.
Find the first registry mirror on each host. If we find any, pull the
images on one host per mirror, then do the remainder concurrently.
The initial pulls will seed the mirrors ensuring that we pull the image
from Docker Hub once each.
This works best if there is only one mirror on each host.
Setting `GITHUB_TOKEN` as in the docs results in reusing the existing
`GITHUB_TOKEN` since `gh` returns that env var if it's set:
```bash
GITHUB_TOKEN=junk gh config get -h github.com oauth_token
junk
```
Using the original env ensures that the templates will be evaluated the
same way regardless of whether envify had been previously invoked.
We didn't log boot errors if there was one role because there was no
barrier and the logging is done by the first host to close the barrier.
Let's always create the barrier to fix this.
Fixes:
```
ERROR (Net::SSH::Exception): Exception while executing on host example.com: could not settle on kex algorithm
Server kex preferences: curve25519-sha256@libssh.org,ext-info-s,kex-strict-s-v00@openssh.com
Client kex preferences: ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1```
add x25519 in Gemfile.lock
- Add local logout to `kamal registry logout`
- Add `skip_local` and `skip_remote` options to `kamal registry` commands
- Skip local login in `kamal deploy` when `--skip-push` is used
Load the hosts from the contexts before trying to build.
If there is no context, we'll create one. If there is one but the hosts
don't match we'll re-create.
Where we just have a local context, there won't be any hosts but we
still inspect the builder to check that it exists.
Validate the Kamal configuration giving useful warning on errors.
Each section of the configuration has its own config class and a YAML
file containing documented example configuration.
You can run `kamal docs` to see the example configuration, and
`kamal docs <section>` to see the example configuration for a specific
section.
The validation matches the configuration to the example configuration
checking that there are no unknown keys and that the values are of
matching types.
Where there is more complex validation - e.g for envs and servers, we
have custom validators that implement those rules.
Additonally the configuration examples are used to generate the
configuration documentation in the kamal-site repo.
You generate them by running:
```
bundle exec bin/docs <kamal-site-checkout>
```
When cloning the git repo:
1. Try to clone
2. If there's already a build directory reset it
3. Check the clone is valid
If anything goes wrong during that process:
1. Delete the clone directory
2. Clone it again
3. Check the clone is valid
Raise any errors after that
Using a different SSHKit runner doesn't work well, because the group
runner uses the Parallel runner internally. So instead we'll patch its
behaviour to fail slow.
We'll also get it to return all the errors so we can report on all the
hosts that failed.
If a primary role container is unhealthy, we might take a while to
timeout the health check poller. In the meantime if we have started the
other roles, they'll be running tow containers.
This could be a problem, especially if they read run jobs as that
doubles the worker capacity which could cause exessive load.
We'll wait for the first primary role container to boot successfully
before starting the other containers from other roles.
To speed up deployments, we'll remove the healthcheck step.
This adds some risk to deployments for non-web roles - if they don't
have a Docker healthcheck configured then the only check we do is if
the container is running.
If there is a bad image we might see the container running before it
exits and deploy it. Previously the healthcheck step would have avoided
this by ensuring a web container could boot and serve traffic first.
To mitigate this, we'll add a deployment barrier. Until one of the
primary role containers passes its healthcheck, we'll keep the barrier
up and avoid stopping the containers on the non-primary roles.
It the primary role container fails its healthcheck, we'll close the
barrier and shut down the new containers on the waiting roles.
We also have a new integration test to check we correctly handle a
a broken image. This highlighted that SSHKit's default runner will
stop at the first error it encounters. We'll now have a custom runner
that waits for all threads to finish allowing them to clean up.
This includes a fix for a bug in the eviction thread that could cause
this error:
```
[ERROR (IOError): Exception while executing on host foo: closed stream]
```
See https://github.com/capistrano/sshkit/pull/534
Docker does not respect the .dockerignore file when building from a tar.
Instead by default we'll make a local clone into a tmp directory and
build from there. Subsequent builds will reset the clone to match the
checkout.
Compared to building directly in the repo, we'll have reproducible
builds.
Compared to using a git archive:
1. .dockerignore is respected
2. We'll have faster builds - docker can be smarter about caching the
build context on subsequent builds from a directory
To build from the repo directly, set the build context to "." in the
config.
If there are uncommitted changes, we'll warn about them either being
included or ignored depending on whether we build from the clone.
Allow hosts to be tagged so we can have host specific env variables.
We might want host specific env variables for things like datacenter
specific tags or testing GC settings on a specific host.
Right now you either need to set up a separate role, or have the app
be host aware.
Now you can define tag env variables and assign those to hosts.
For example:
```
servers:
- 1.1.1.1
- 1.1.1.2: tag1
- 1.1.1.2: tag2
- 1.1.1.3: [ tag1, tag2 ]
env_tags:
tag1:
ENV1: value1
tag2:
ENV2: value2
```
The tag env supports the full env format, allowing you to set secret and
clear values.
When ssh options are set, they overwrite username and password passed as ssh builder uri. Passing part of uri for ssh-kit is fine, as it then properly extracts username and password and forwards it as host.ssh_options (in which case it's no longer empty)
Occasionally in CI things run slowly and it takes more that 1 second
for a cli test to run, so let's allow any value for the runtime in the
hook checks.
Extract Kamal::Commander::Specifics to deal with host and role setup and
ensure that primary hosts and roles always come first. This means that
in a rolling deploy we deploy to the primary ones first.
This will be important when we remove the healthcheck step as we want
to confirm the primary host can be deployed to before completing a
deployment for other roles.
By setting the hosts and roles all together in one place we can sort
the primary ones to the front without creating infinite loops.
Currently the latest container is the one that was created last. But if
we have had a failed deployment that left two containers running that
would not be the one we want. The second container could be in a
restart loop for example.
Instead we want the container that is running the image tagged as
latest. As we now tag as latest after a successful deployment we can
trust that that is a healthy container.
In the case that there is no container running the latest image tag,
we'll fall back to the latest container.
This could happen if the deploy was halted in between the old container
being stopped and the image being tagged as latest.
If you are deploying more than one destination to a host, the latest
tags will conflict, so we'll append the destination to the tag.
The latest tag is used when booting the app or exec-ing a new container.
If a deploy doesn't complete on a host for all roles then we should
probably not be using it, so move the tagging to the end of the boot
process.
This will allow us to filter for containers that have no destination in
cases where we deploy an empty + a non empty destination to the same
host.
To note:
```
\# Containers with a destination label
$ docker ps --filter label=destination
\# Containers with an empty destination label
$ docker ps --filter label=destination=
```
If no context is specified and we are in a git repo, then we'll build
from a git archive by default. This means we don't need a separate
setting and gives us a safer default build.
Building directly from a checkout will pull in uncommitted files to or
more sneakily files that are git ignored, but not docker ignored.
To avoid this, we'll add an option to build from a git archive of HEAD
instead. Docker doesn't provide a way to build directly from a git
repo, so instead we create a tarball of the current HEAD with git
archive and pipe it into the build command.
When building from a git archive, we'll still display the warning about
uncommitted changes, but we won't add the `_uncommitted_...` suffix to
the container name as they won't be included in the build.
Perhaps this should be the default, but we'll leave that decision for
now.
Secret and clear env variables have different lifecycles. The clear ones
are part of the repo, so it makes sense to always deploy them with the
rest of the repo.
The secret ones are external so we can't be sure that they are up to
date, therefore they require an explicit push via `envify` or `env push`.
We'll keep the env file, but now it just contains secrets. The clear
values are passed directly to `docker run`.
Add an app with roles to the integration tests. We'll deploy two web
containers and one worker. The worker just sleeps, so we are testing
that the container has booted.
If curl is not available to download the docker install script, try
with wget instead.
If neither is available or both fail, return a simple failing script
so that we don't carry on regardless.
Fixes: https://github.com/basecamp/kamal/issues/395
This allows the user to make any necessary configuration changes to
Docker before setting up any containers, allowing those configuration
changes to take effect from the outset.
By default we keep 5 containers around for rollback. The containers
don't take much space, but the images for them can.
Make the number of containers to retain configurable, either in the
config with the `retain_containers` setting on the command line
with the `--retain` option.
1. Add Ruby 2.7 specific Gemfile that uses an older version of nokogiri
2. Rails edge doesn't support Ruby 2.7.0, so exclude it.
3. Add Ruby 3.3
4. Update Gemfile.lock to test against Rails 7.1.2 as it's the latest
version.
5. Remove continue-on-error from the matrix and always set to true
This is better then adding them together which confusingly results in
both ENV vars in the same file, though based on the load order, they
worked anyway.
This allows for more meaningful naming in roles.
The only caution here is that we don't support the renaming of roles, so
any migration is left to the user.
Provide pre and post reboot hooks for Traefik, that can be used to
remove/add to an external load balancer to prevent requests from being
sent during the reboot.
Works best with the --rolling setting, where each hook is called once
per host.
If the app container is down or not responding then traefik will return
a 404 response code. This is not ideal as it suggests a client rather
than a server problem.
To fix this, we'll define a catch all route that always returns a 502.
This is not ideal as this route would take priority over a shorter route
with priorty 1.
TODO: up the priority of the app route.
Calling `load_envs` again does not load updated env variables, because
Dotenv does not overwrite existing values.
To fix this we'll store the original ENV and reset to it before
reloading.
https://github.com/basecamp/kamal/issues/512
The "No git remote set" error message was appropriate for the previous
block (where it was presumably copy-pasted from), but in this line we
have failed the check that determines if we have a git branch checked
out, so we should output a corresponding error.
The env check is not needded anymore as all the commands rely on the
env files having already been created remotely.
The only place the env is needed is when running `kamal env push` and
that will still raise an apropriate error.
When calling `kamal app exec` for new non interactive containers, run
the command per role on each server and include the role config
including the environment.
Fixes: https://github.com/basecamp/kamal/issues/492
In lieu of a general purpose mechanism to pass dynamically-evaluated env-vars at container execution time, we can pass the `config.version` as KAMAL_VERSION to avoid having to take apart the container name just to determine the SHA of the deployed version in the entrypoint.
Adding -T to the copy command ensures that the files are copied at the
same level into the target directory whether it exists or not.
That allows us to drop the `/*` which was not picking up hidden files.
Fixes: https://github.com/basecamp/kamal/issues/465
Kamal needs images to have the service label so it can track them for
pruning. Images built by Kamal will have the label, but externally built
ones may not.
Without it images will build up over time. The worst case is an outage
if all the hosts disks fill up at the same time.
We'll add a check for the label and halt if it is not there.
If you always want to use a destination, and have a base deploy.yml file
that doesn't specify any hosts, then if you forget to specific the
destination you will get a cryptic error.
Add a "require_destination" setting you can use to avoid this.
An interrupted deployment can leave older containers lying around. To
ensure they are cleaned up subsequently, stop stale containers during
deployments instead of just reporting them.
During deployments both the old and new containers will be active for a
small period of time. There also may be lagging requests for older CSS
and JS after the deployment.
This can lead to 404s if a request for old assets hits a new container
or visa-versa.
This PR makes sure that both sets of assets are available throughout the
deployment from before the new version of the app is booted.
This can be configured by setting the asset path:
```yaml
asset_path: "/rails/public/assets"
```
The process is:
1. We extract the assets out of the container, with docker run, docker
cp, docker stop. Docker run sets the container command to "sleep" so
this needs to be available in the container.
2. We create an asset volume directory on the host for the new version
of the app on the host and copy the assets in there.
3. If there is a previous deployment we also copy the new assets into
its asset volume and copy the older assets into the new asset volume.
4. We start the new container mapping the asset volume over the top of
the container's asset path.
This means the both the old and new versions have replaced the asset
path with a volume containing both sets of assets and should be able
to serve any request during the deployment. The older assets will
continue to be available until the next deployment.
The go template was concatenating all the mounts into one line. It
happened to work because the mount we are interested was always first.
Fix it to output one mount per line instead.
When replacing a container currently we:
1. Boot the new container
2. Wait for it to become healthy
3. Stop the old container
Traefik will send requests to the old container until it notices that it
is unhealthy. But it may have stopped serving requests before that point
which can result in errors.
To get round that the new boot process is:
1. Create a directory with a single file on the host
2. Boot the new container, mounting the cord file into /tmp and
including a check for the file in the docker healthcheck
3. Wait for it to become healthy
4. Delete the healthcheck file ("cut the cord") for the old container
5. Wait for it to become unhealthy and give Traefik a couple of seconds
to notice
6. Stop the old container
The extra steps ensure that Traefik stops sending requests before the
old container is shutdown.
Setting env variables in the docker arguments requires having them on
the deploy host.
Instead we'll add two new commands `kamal env push` and
`kamal env delete` which will manage copying the environment as .env
files to the remote host.
Docker will pick up the file with `--env-file <path-to-file>`. Env files
will be stored under `<kamal run directory>/env`.
Running `kamal env push` will create env files for each role and
accessory, and traefik if required.
`kamal envify` has been updated to also push the env files.
By avoiding using `kamal envify` and creating the local and remote
secrets manually, you can now avoid accessing secrets needed
for the docker runtime environment locally. You will still need build
secrets.
One thing to note - the Docker doesn't parse the environment variables
in the env file, one result of this is that you can't specify multi-line
values - see https://github.com/moby/moby/issues/12997.
We maybe need to look docker config or docker secrets longer term to get
around this.
Hattip to @kevinmcconnell - this was all his idea.
To avoid polluting the default SSH directory with lots of Kamal config,
we'll default to putting them in a `kamal` sub directory.
But also make the directory configurable with the `run_directory` key,
so for example you can set it as `/var/run/kamal/`
The directory is created during bootstrap or before any command that
will need to access a file.
Adds the `publish` option which, if set to false, does not pass `--publish` to
`docker run` when starting Traefik. This is useful when running Traefik
behind a reverse proxy, for example.
When stopping or starting Traefik, don't hide important errors.
Docker doesn't return an error when starting a started container or
stopping a stopped container.
When rebooting we want to know about errors during run as we've just
stopped and removed the previous container.
When booting, we want to leave the running container if it exists,
restart a stopped container and run a new one if none exists.
We can implement this with `docker start ... || docker run ...`:
- if the container is started, `docker start` will exit with 0
- if the container is stopped, `docker start` will start it and exit with 0
- if the container doesn't exist, `docker start` will return a non zero
exit code and `docker run` will create a new container. Any errors in
`docker run` will be returned.
The version extraction assumed that the version is everything after the
last `-` in the container name. This doesn't work if you deploy a
non-MRSK generated version that contains a `-`.
To fix we'll generate the non version prefix and strip it off. In some
places for this to work we need to make sure to pass the role through.
Fixes: https://github.com/mrsked/mrsk/issues/402
Set a high idle timeout on the sshkit connection pool. This will
reduce the incidence of re-connection storms when a deployment has been
idle for a while (e.g. when waiting for a docker build).
The default timeout was 30 seconds, so we'll enable keepalives at a
30s interval to match. This is to help prevent connections from being
killed during long idle periods.
Starting many (90+) SSH connections has caused us some issues such as
failed DNS lookups and hitting process file descriptor limits.
To mitigate this, patch SSHKit::Backend::Netssh to limit concurrency of
connection starts. We'll default to 30 at a time which seems to work
without issue, but can be configured via:
```
sshkit:
max_concurrent_starts: 10
```
* main:
Removed not needed MRSK.traefik.run command in Traefil reboot
Updated README with locking directory name
Include service name to lock details
Configurable SSH log levels
Add registry container output to debug
Minor tweaks to hooks section in readme
Update README.md
Updated README.md to make setup examples consistent
Login to the registry proactively before stoping Accessory and Traefik
Rename `with_lock` to more generic `mutating` and move the env_args
check to that point. This allows read-only actions to be run without
requiring secrets.
To make it easier to identity where a docker container is running,
prefix its hostname with the underlying one from the host.
Docker chooses a 12 character random hex string by default, so we'll
keep that as the suffix.
If you different images with the same git SHA, on the second deploy the
tag is moved and the first image becomes untagged. It may however still
be attached to an existing container.
To handle this:
1. Initially prune dangling images - this will remove any untagged
images that are not attached to an existing image
2. Then filter out the untagged images when deleting tagged images - any
that remain will be attached to a container.
The second issue is that `docker container ls -a --format '{{.Image}}`
will sometimes return the image id rather than a tag. This means that
the image doesn't get filtered out when we grep to remove the active
images.
To fix that we'll grep against both the image id and repo:tag.
Useful for checking the status of CI before deploying. Doing this at
this point in the deployment maximises the parallelisation of building
and running CI.
dangling=true doesn't prune any images, as we are not creating dangling
images.
Using --all should remove unused images, but it considers the Git SHA
tag on the latest image to be unused (presumably because there are two
tags, the SHA and latest and the running container is only considered to
be using "latest"). As a result it deletes the tag, which means that we
can't rollback to that SHA later.
Its a bit more complicated to only remove images that are not referenced
by any containers.
First we find the tags we want to keep from the containers (running and
stopped).
Then we append the latest tag to that list.
Then we get a full list of image tags and remove those tags from that
list (using `grep -v -w`).
Finally we pass the tags to `docker rmi`. That either deletes the tag if
there are other references to the image or both the tag and the image if
it is the only one.
These replace the custom audit_broadcast_cmd code. An additional env
variable MRSK_RUNTIME is passed to them.
The audit broadcast after booting an accessory has been removed.
Adds hooks to MRSK. Currently just two hooks, pre-build and post-push.
We could break the build and push into two separate commands if we
found the need for post-build and/or pre-push hooks.
Hooks are stored in `.mrsk/hooks`. Running `mrsk init` will now create
that folder and add sample hook scripts.
Hooks returning non-zero exit codes will abort the current command.
Further potential work here:
- We could replace the audit broadcast command with a
post-deploy/post-rollback hook or similar
- Maybe provide pre-command/post-command hooks that run after every
mrsk invocation
- Also look for hooks in `~/.mrsk/hooks`
The cause message doesn't include the host the error occurred on.
Before:
```
$ mrsk deploy
Acquiring the deploy lock...
Finished all in 0.1 seconds
ERROR (SocketError): getaddrinfo: nodename nor servname provided, or not known
```
After:
```
$ mrsk deploy -d staging
Acquiring the deploy lock...
Finished all in 0.1 seconds
ERROR (SocketError): Exception while executing on host server-123: getaddrinfo: nodename nor servname provided, or not known
```
Add tests for main, app, accessory, traefik and lock commands.
Other commands are generally covered by the main tests.
Also adds some changes to speed up the integration specs:
- Use a persistent volume for the registry so we can push images to to
reuse between runs (also gets around docker hub rate limits)
- Use persistent volume for mrsk gem install, to avoid re-installing
between tests
- Shorter stop wait time
- Shorter connection timeouts on the load balancer
Takes just over 2 minutes to run all tests locally on an M1 Mac
after docker caches are primed.
The code in Mrsk::Cli::Main#rollback was very similar to
Mrsk::Cli::App#boot.
Modify Mrsk::Cli::App#boot so it can handle rollbacks by:
1. Only renaming running containers
2. Trying first to start then run the new container
If there are uncommitted changes in the app repository when building,
then append `_uncommitted_<random>` to it to distinguish the image
from one built from a clean checkout.
Also change the version used when renaming a container on redeploy to
distinguish and explain the version suffixes.
In the image prune command --all overrides --dangling=true. This removes
the image git sha image tag for the latest image which prevented
us from rolling back to it.
I've updated the integration test to now test deploy, redeploy and
rollback.
Audit details
* Audit logs and broadcasts accept `details` whose values are included as log tags and MRSK_* env vars passed to the broadcast command
* Commands may return execution options to the CLI in their args list
* Introduce `mrsk broadcast` helper for sending audit broadcasts
* Report UTC time, not local time, in audit logs. Standardize on ISO 8601 format
* main:
Don't run actions twice on PRs
Further distinguish dependency verification
Naming
Reveal configured dockerfile path
Style
Distinguish from server dependencies
Distinguish from local dependency verification
Improve clarity and intent
Style
Style
Style
Add local dependencies check
Bootstrap: use multi-platform installer
* main:
Simplify domain language to just "boot" and unscoped config keys
Retain a fixed number of containers when pruning
Don't assume rolling back in message
Check all hosts before rolling back
Ensure Traefik service name is consistent
Extend traefik delay by 1 second
Include traefik access logs
Check if we are still getting a 404
Also dump load balancer logs
Dump traefik logs when app not booted
Fix missing for apt-get
Report on container health after failure
Fix the integration test healthcheck
Allow percentage-based rolling deployments
Move `group_limit` & `group_wait` under `boot`
Limit rolling deployment to boot operation
Allow performing boot & start operations in groups
* main:
Simplify domain language to just "boot" and unscoped config keys
Retain a fixed number of containers when pruning
Don't assume rolling back in message
Check all hosts before rolling back
Ensure Traefik service name is consistent
Extend traefik delay by 1 second
Include traefik access logs
Check if we are still getting a 404
Also dump load balancer logs
Dump traefik logs when app not booted
Fix missing for apt-get
Report on container health after failure
Fix the integration test healthcheck
Allow percentage-based rolling deployments
Move `group_limit` & `group_wait` under `boot`
Limit rolling deployment to boot operation
Allow performing boot & start operations in groups
Time based container and image retention can have variable space
requirements depending on how often we deploy.
- Only prune stopped containers, retaining the 5 newest
- Then prune dangling images so we only keep images for the retained
containers.
Hosts could end up out of sync with each other if prune commands are run
manually or when new hosts are added.
Before rolling back confirm that the required container is available on
all hosts and roles.
If we don't specify any service properties when labelling containers,
the generated service will be named according to the container. However,
we change the container name on every deployment (as it is versioned),
which means that the auto-generated service name will be different in
each container.
That is a problem for two reasons:
- Multiple containers share a common router while a deployment is
happening. At this point, the router configuration will be different
between the containers; Traefik flags this as an error, and stops
routing to the containers until it's resolved.
- We allow custom labels to be set in an app's config. In order to
define custom configuration on the service, we'll need to know what
it will be called.
Changed to force the service name by setting one of its properties.
Add checks for:
* Docker installed locally
* Docker buildx plugin installed locally
* Dockerfile exists
If checks fail, it will halt deployment and provide more specific error messages.
Also adds a cli subcommand:
`mrsk build dependencies`
Fixes: #109 and #237
Getting the lock status with invoke passes through any options from the
original command which will raise an exception if they are not also
valid for the lock status command.
Fixes https://github.com/mrsked/mrsk/issues/239
Rather than waiting 5 seconds and hoping for the best after we boot
docker compose, add docker healthchecks and wait for all the containers
to be healthy.
Adds a simple integration test to ensure that `mrsk deploy` works.
Everything required is spun up with docker compose:
- shared: a container that contains an ssh key and a self signed cert to
be shared between the images
- deployer: the image we will deploy from
- registry: a docker registry
- two vm images to deploy into
- load_balancer: an nginx load balancer to use between our images
The other images are in privileged mode so that we can run
docker-in-docker. We need to run docker inside the images - mapping in
the docker socket doesn't work because both VMs would share the host
daemon.
The docker registry requires a self signed cert as you cannot use basic
auth over HTTP except on localhost. It runs on port 4443 rather than 443
because docker refused to accept that "registry" is a docker host and
tries to push images to docker.io/registry. "registry:4443" works fine.
The shared container contains the ssh keys for the deployer and vms, and
the self signed cert for the registry. When the shared container boots,
it copies them into a shared volume.
The other deployer and vm images are built with soft links from the
shared volume to the require locations. Their boot scripts wait for the
files to be copied in before continuing.
The root mrsk folder is mapped into the deployer container. On boot it
builds the gem and installs it.
Right now there's just a single test. We confirm that the load balancer
is returning a 502, run `mrsk deploy` and then confirm it returns 200.
When invoking the audit broadcast command, provide a few environment
variables so that people can customize the format of the message if they
want.
We currently provide `MRSK_PERFORMER`, `MRSK_ROLE`, `MRSK_DESTINATION` and
`MRSK_EVENT`.
Also adds the destination to the default message, which we continue to
send as the first argument as before.
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.
When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.
There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:
- Supports non-HTTP checks, and app-specific check scripts provided by a
container.
- When booting a container, allows MRSK to wait for a container to be
healthy before shutting down the old container it replaces. This
should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
reach a healthy state, which prevents any traffic from being routed to
them before they are ready.
The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
Adds top-level configuration options for `group_limit` and `group_wait`.
When a `group_limit` is present, we'll perform app boot & start
operations on no more than `group_limit` hosts at a time, optionally
sleeping for `group_wait` seconds after each batch.
We currently only do this batching on boot & start operations (including
when they are part of a deployment). Other commands, like `app stop` or
`app details` still work on all hosts in parallel.
If we get an error we'll only hold the deploy lock if it occurs while
trying to switch the running containers.
We'll also move tagging the latest image from when the image is pulled
to just before the container switch. This ensures that earlier errors
don't leave the hosts with an updated latest tag while still running the
older version.
* main: (24 commits)
Bump version for 0.11.0
Labels can be added to Traefik
Make rollbacks role-aware
fix typo role to roles
Explained the latest modifications of Traefik container labels
Remove .idea folder
Updated README.md with new healthcheck.max_attempts option
Fix test case: console output message was not updated to display the current/total attempts
Require net-ssh ~> 7.0 for SHA-2 support
Improved deploy lock acquisition
Excess CR
Style
Simpler
Make it explicit, focus on Ubuntu
More explicit
Not that --bundle is a Rails 7+ option
Update README.md
Update README.md
Improved: configurable max_attempts for healthcheck
Traefik service name to be derived from role and destination
...
Rollbacks stopped working after https://github.com/mrsked/mrsk/pull/99.
We'll confirm that a container is available for the first role on the
primary host before attempting to rollback.
Versions of net-ssh before 7.0 do not support the SHA-2 algorithm and result in mrsk not being able to connect to hosts using keys generated with it. net-ssh is also a dependency of sshkit, however, sshkit has a version requirement of >= 2.8.0 for net-ssh, so is not effective at ensuring mrsk has the version it needs to be the most compatible.
1. Don't raise lock error for non-lock issues during lock acquire
(see https://github.com/mrsked/mrsk/pull/181)
2. If there is an error while the lock is held, don't release the lock
and send a warning to stderr
* main:
Simpler
Make it explicit, focus on Ubuntu
More explicit
Not that --bundle is a Rails 7+ option
Update README.md
Update README.md
Add github discussions link to readme
Bump debug to fix missing deps in CI
Only redact the non-sensitive bits of build args and env vars.
improve code sample (traefik configuration)
I realize that there's a discussions link on github but I didn't realize mrsk actually utilized it until I saw it mentioned on Discord. I was thinking adding it to the readme would help push people there.
Accounts for the 2.9.10 security release and allows testing Traefik 3 betas.
* Use `image` to configure a specific Traefik Docker image.
* Default to `traefik:v2.9` to track future 2.9.x minor releases rather
than tightly pinning to `v2.9.9`.
* Support images from the configured registry.
References #165
* `-e [REDACTED]` → `-e SOME_SECRET=[REDACTED]`
* Replaces `Utils.redact` with `Utils.sensitive` to clarify that we're
indicating redactability, not actually performing redaction.
* Redacts from YAML output, including `mrsk config` (fixes#96)
Allow the hosts for accessories to be specified by host or role, or on
all app hosts by setting `daemon: true`.
```
# Single host
mysql:
host: 1.1.1.1
# Multiple hosts
redis:
hosts:
- 1.1.1.1
- 1.1.1.2
# By role
monitoring:
roles:
- web
- jobs
```
When deploying check if there is already a container with the existing
name. If there is rename it to "<version>_<random_hex_string>" to remove
the name clash with the new container we want to boot.
We can then do the normal zero downtime run/wait/stop.
While implementing this I discovered the --filter name=foo does a
substring match for foo, so I've updated those filters to do an exact
match instead.
* main: (32 commits)
Inline default as with other options
Symbols!
Fix tests
test stop with custom stop wait time
No need to replicate Docker default
Describe purpose rather than elements
Style and ordering
Customizable stop wait time
Fix tests
Ensure it also works when configuring just log options without setting a driver
Add accessory test
Undo change
Improve test
Update README
Ensure default log option `max-size=10m`
#142 Allow to customize container options in accessories
Fix flaky test
Fix tests
More resilient tests
Fix other tests
...
* main:
Wording
Remove accessory images using tags rather than labels
Update readme to point to ghcr.io/mrsked/mrsk
Validate that all roles have hosts
Commander needn't accumulate configuration
Pull latest image tag, so we can identity it
Default to deploying the config version
Remove unneeded Dockerfile.dind, update Readme
add D-in-D dockerfile, update Readme
Add a deploy lock for commands that are unsafe to run concurrently.
The lock is taken by creating a `mrsk_lock` directory on the primary
host. Details of who took the lock are added to a details file in that
directory.
Additional CLI commands have been added to manual release and acquire
the lock and to check its status.
```
Commands:
mrsk lock acquire -m, --message=MESSAGE # Acquire the deploy lock
mrsk lock help [COMMAND] # Describe subcommands or one specific subcommand
mrsk lock release # Release the deploy lock
mrsk lock status # Report lock status
Options:
-v, [--verbose], [--no-verbose] # Detailed logging
-q, [--quiet], [--no-quiet] # Minimal logging
[--version=VERSION] # Run commands against a specific app version
-p, [--primary], [--no-primary] # Run commands only on primary host instead of all
-h, [--hosts=HOSTS] # Run commands on these hosts instead of all (separate by comma)
-r, [--roles=ROLES] # Run commands on these roles instead of all (separate by comma)
-c, [--config-file=CONFIG_FILE] # Path to config file
# Default: config/deploy.yml
-d, [--destination=DESTINATION] # Specify destination to be used for config file (staging -> deploy.staging.yml)
-B, [--skip-broadcast], [--no-skip-broadcast] # Skip audit broadcasts
```
If we add support for running multiple deployments on a single server
we'll need to extend the locking to lock per deployment.
Commander had version/destination solely to incrementally accumulate CLI
options. Simpler to configure in one shot.
Clarifies responsibility and lets us introduce things like
`abbreviated_version` in one spot - Configuration.
`docker image ls` doesn't tell us what the latest deployed image is (e.g
if we've rolled back). Pull the latest image tag through to the server
so we can use it instead.
* main:
Ask for access token
Style
Style
config.traefik is already nil safe
Update README.md
Bump dev deps and consolidate platform matches
Deploys mention the released service@version
Accessories aren't required to publish a port
Accessories may be pulled from authenticated registries
Polish destination config loading
Allow arbitrary docker options for traefik
Fixed typos
Fixed readme
Rebased on main
Added volume configuration in response to issue coments
Modified in response to PR comments
Added the additional_ports configuration
Less work for broadcast commands to take on.
Also fixes a bug where rollback on hosts without a running container
would stop the container they had just started.
If we don't supply a version when deploying we'll use the result of
docker image ls to decide which image to boot. But that doesn't
necessarily correspond to the one we have just built.
E.g. if you do something like:
```
mrsk deploy # deploys git sha AAAAAAAAAAAAAA
git commit --amend # update the commit message
mrsk deploy # deploys git sha BBBBBBBBBBBBBB
```
In this case running `docker image ls` will give you the same image
twice (because the contents are identical) with tags for both SHAs but
the image we have just built will not be returned first. Maybe the order
is random, but it always seems to come second as far as I have seen.
i.e you'll get something like:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
foo/bar AAAAAAAAAAAAAA 6272349a9619 31 minutes ago 791MB
foo/bar BBBBBBBBBBBBBB 6272349a9619 31 minutes ago 791MB
```
Since we already know what version we want to deploy from the config,
let's just pass that through.
* main:
Add another assertion for `escape_shell_value`
Add tests for `Mrsk::Utils`
Fix indentation
Don't report exception here too
Don't report exception
Add CLI tests for remaining commands that are not tested yet
Minor: Properly require active_support
Because the container name is generated it isn't possible to
determine this inside the container.
This adds the MRSK_CONTAINER_NAME env var when running the
container so it can be read by the service running inside the
container.
The implementation has been updated upstream[^1] to expect symbolized
keys. MRSK relies heavily on the fact that nested keys are strings, so
we're removing existing uses of `#dig`.
[^1]: 5c15b586aa
The workflow was failing with:
```
The workflow is not valid. .github/workflows/docker-publish.yml (Line: 22, Col: 14): Unexpected symbol: '|'. Located at position 12 within expression: github.ref | replace('refs/tags/', '')
```
The `set-output` command is deprecated, so the issue has been fixed by utilising the `github.ref_name` context to retrieve the version tag that triggered the workflow.
> `github.ref_name`: The short ref name of the branch or tag that triggered the workflow run. This value matches the branch or tag name shown on GitHub. For example, `feature-branch-1`.
https://docs.github.com/en/actions/learn-github-actions/contexts
The Healthcheck container is shut down right after performing the check, this
makes it harder to troubleshoot configuration issues in the healthcheck
endpoint, e.g DNS rebinding error. Printing the container logs helps the troubleshooting.
Match word
Language
Suggest what accessories are
There are also accessories
Default already shown
Better example
Warn about secrets being shown
Now also accessories
Wording
Clarifications
Clarify how to see options
General option for all
Options important here too
Hide subcommands
Implied
Simpler as just version
Be concise
Missing word
Wordsmith
Simpler and uniform words are better
Clarify what exactly we're manipulating
Wordsmithing
Implicit
Simpler language
Hide subcommands
Clarify its container management
Just one per server
Simpler
As contributors and maintainers of the Kamal project, we pledge to create a welcoming and inclusive environment for everyone. We value the participation of each member of our community and want all contributors to feel respected and valued.
We are committed to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, or religion (or lack thereof). We do not tolerate harassment of participants in any form.
This code of conduct applies to all Kamal project spaces, including but not limited to project code, issue trackers, chat rooms, and mailing lists. Violations of this code of conduct may result in removal from the project community.
## Our standards
Examples of behavior that contributes to creating a positive environment include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a professional setting
## Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned with this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
## Reporting
If you are subject to or witness unacceptable behavior, or have any other concerns, please notify a project maintainer. All reports will be kept confidential and will be reviewed and investigated promptly.
We will investigate every complaint and take appropriate action. We reserve the right to remove any content that violates this Code of Conduct, or to temporarily or permanently ban any contributor for other behaviors that we deem inappropriate, threatening, offensive, or harmful.
## Attribution
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at <https://www.contributor-covenant.org/version/1/4/code-of-conduct.html>.
Thank you for considering contributing to Kamal! This document outlines some guidelines for contributing to this open source project.
Please make sure to review our [Code of Conduct](CODE_OF_CONDUCT.md) before contributing to Kamal.
There are several ways you can contribute to the betterment of the project:
- **Report an issue?** - If the issue isn’t reported, we can’t fix it. Please report any bugs, feature, and/or improvement requests on the [Kamal GitHub Issues tracker](https://github.com/basecamp/kamal/issues).
- **Submit patches** - Do you have a new feature or a fix you'd like to share? [Submit a pull request](https://github.com/basecamp/kamal/pulls)!
- **Write blog articles** - Are you using Kamal? We'd love to hear how you're using it with your projects. Write a tutorial and post it on your blog!
## Issues
If you encounter any issues with the project, please check the [existing issues](https://github.com/basecamp/kamal/issues) first to see if the issue has already been reported. If the issue hasn't been reported, please open a new issue with a clear description of the problem and steps to reproduce it.
## Pull Requests
Please keep the following guidelines in mind when opening a pull request:
- Ensure that your code passes the project's minitests by running ./bin/test.
- Provide a clear and detailed description of your changes.
- Keep your changes focused on a single concern.
- Write clean and readable code that follows the project's code style.
- Use descriptive variable and function names.
- Write clear and concise commit messages.
- Add tests for your changes, if possible.
- Ensure that your changes don't break existing functionality.
#### Commit message guidelines
A good commit message should describe what changed and why.
## Development
The `main` branch is regularly built and tested, but it is not guaranteed to be completely stable. Tags are created regularly from release branches to indicate new official, stable release versions of Kamal.
Kamal is written in Ruby. You should have Ruby 3.2+ installed on your machine in order to work on Kamal. If that's already setup, run `bundle` in the root directory to install all dependencies. Then you can run `bin/test` to run all tests.
1. Fork the project repository.
2. Create a new branch for your contribution.
3. Write your code or make the desired changes.
4.**Ensure that your code passes the project's minitests by running ./bin/test.**
5. Commit your changes and push them to your forked repository.
6. [Open a pull request](https://github.com/basecamp/kamal/pulls) to the main project repository with a detailed description of your changes.
## License
Kamal is released under the MIT License. By contributing to this project, you agree to license your contributions under the same license.
MRSK deploys Rails apps in containers to servers running Docker with zero downtime. It uses the dynamic reverse-proxy Traefik to hold requests while the new application container is started and the old one is stopped. It works seamlessly across multiple hosts, using SSHKit to execute commands.
From bare metal to cloud VMs, deploy web apps anywhere with zero downtime. Kamal uses [kamal-proxy](https://github.com/basecamp/kamal-proxy) to seamlessly switch requests between containers. Works seamlessly across multiple servers, using SSHKit to execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker.
## Installation
➡️ See [kamal-deploy.org](https://kamal-deploy.org) for documentation on [installation](https://kamal-deploy.org/docs/installation), [configuration](https://kamal-deploy.org/docs/configuration), and [commands](https://kamal-deploy.org/docs/commands).
Install MRSK globally with `gem install mrsk`. Then, inside your app directory, run `mrsk install`. Now edit the new file `config/deploy.yml`. It could look as simple as this:
Now you're ready to deploy a multi-arch image to the servers:
```
MRSK_REGISTRY_PASSWORD=pw mrsk deploy
```
This will:
1. Connect to the servers over SSH (using root by default, authenticated by your loaded ssh key)
2. Install Docker on any server that might be missing it (using apt-get)
3. Log into the registry both locally and remotely
4. Build the image using the standard Dockerfile in the root of the application.
5. Push the image to the registry.
6. Pull the image from the registry on the servers.
7. Ensure Traefik is running and accepting traffic on port 80.
8. Stop any containers running a previous versions of the app.
9. Start a new container with the version of the app that matches the current git version hash.
10. Prune unused images and stopped containers to ensure servers don't fill up.
Voila! All the servers are now serving the app on port 80. If you're just running a single server, you're ready to go. If you're running multiple servers, you need to put a load balancer in front of them.
## Why not just run Capistrano or Kubernetes?
MRSK basically is Capistrano for Containers, which allow us to use vanilla servers as the hosts. No need to ensure that the servers have just the right version of Ruby or other dependencies you need. That all lives in the Docker image now. You can boot a brand new Ubuntu (or whatever) server, add it to the deploy servers of MRSK, and it'll be auto-provisioned with Docker, and run right away. Docker's layer caching also allows for quicker deployments with less mucking about on the server. And the images built for MRSK can be used for CI or later introspection.
Kubernetes is a beast. Running it yourself on your own hardware is not for the faint of heart. It's a fine option if you want to run on someone else's platform, like Render or Fly, but if you'd like the freedom to move between cloud and your own hardware, or even mix the two, MRSK is much simpler. You can see everything that's going on, it's just basic Docker commands being called.
## Configuration
### Using .env file to load required environment variables
MRSK uses [dotenv](https://github.com/bkeepers/dotenv) to automatically load environment variables set in the `.env` file present in the application root. This file can be used to set variables like `MRSK_REGISTRY_PASSWORD` or database passwords. But for this reason you must ensure that .env files are not checked into Git or included in your Dockerfile! The format is just key-value like:
```bash
MRSK_REGISTRY_PASSWORD=pw
DB_PASSWORD=secret123
```
### Using another registry than Docker Hub
The default registry is Docker Hub, but you can change it using `registry/server`:
The default SSH user is root, but you can change it using `ssh_user`:
```yaml
ssh_user:app
```
### Using env variables
You can inject env variables into the app containers using `env`:
```yaml
env:
DATABASE_URL:mysql2://db1/hey_production/
REDIS_URL:redis://redis1:6379/1
```
### Using secret env variables
If you have env variables that are secret, you can divide the `env` block into `clear` and `secret`:
```yaml
env:
clear:
DATABASE_URL:mysql2://db1/hey_production/
REDIS_URL:redis://redis1:6379/1
secret:
- DATABASE_PASSWORD
- REDIS_PASSWORD
```
The list of secret env variables will be expanded at run time from your local machine. So a reference to a secret `DATABASE_PASSWORD` will look for `ENV["DATABASE_PASSWORD"]` on the machine running MRSK. Just like with build secrets.
If the referenced secret ENVs are missing, the configuration will be halted with a `KeyError` exception.
Note: Marking an ENV as secret currently only redacts its value in the output for MRSK. The ENV is still injected in the clear into the container at runtime.
### Using volumes
You can add custom volumes into the app containers using `volumes`:
```yaml
volumes:
- "/local/path:/container/path"
```
### Using different roles for servers
If your application uses separate hosts for running jobs or other roles beyond the default web running, you can specify these hosts in a dedicated role with a new entrypoint command like so:
```yaml
servers:
web:
- 192.168.0.1
- 192.168.0.2
job:
hosts:
- 192.168.0.3
- 192.168.0.4
cmd:bin/jobs
```
Note: Traefik will only by default be installed and run on the servers in the `web` role (and on all servers if no roles are defined). If you need Traefik on hosts in other roles than `web`, add `traefik: true`:
```yaml
servers:
web:
- 192.168.0.1
- 192.168.0.2
web2:
traefik:true
hosts:
- 192.168.0.3
- 192.168.0.4
```
### Using container labels
You can specialize the default Traefik rules by setting labels on the containers that are being started:
Note: The extra quotes are needed to ensure the rule is passed in correctly!
This allows you to run multiple applications on the same server sharing the same Traefik instance and port.
See https://doc.traefik.io/traefik/routing/routers/#rule for a full list of available routing rules.
The labels can also be applied on a per-role basis:
```yaml
servers:
web:
- 192.168.0.1
- 192.168.0.2
job:
hosts:
- 192.168.0.3
- 192.168.0.4
cmd:bin/jobs
labels:
my-label:"50"
```
### Using remote builder for native multi-arch
If you're developing on ARM64 (like Apple Silicon), but you want to deploy on AMD64 (x86 64-bit), you can use multi-archecture images. By default, MRSK will setup a local buildx configuration that does this through QEMU emulation. But this can be quite slow, especially on the first build.
If you want to speed up this process by using a remote AMD64 host to natively build the AMD64 part of the image, while natively building the ARM64 part locally, you can do so using builder options:
Note: You must have Docker running on the remote host being used as a builder. This instance should only be shared for builds using the same registry and credentials.
### Using remote builder for single-arch
If you're developing on ARM64 (like Apple Silicon), want to deploy on AMD64 (x86 64-bit), but don't need to run the image locally (or on other ARM64 hosts), you can configure a remote builder that just targets AMD64. This is a bit faster than building with multi-arch, as there's nothing to build locally.
```yaml
builder:
remote:
arch:amd64
host:ssh://root@192.168.0.1
```
### Using native builder when multi-arch isn't needed
If you're developing on the same architecture as the one you're deploying on, you can speed up the build by forgoing both multi-arch and remote building:
```yaml
builder:
multiarch:false
```
This is also a good option if you're running MRSK from a CI server that shares architecture with the deployment servers.
### Using build secrets for new images
Some images need a secret passed in during build time, like a GITHUB_TOKEN to give access to private gem repositories. This can be done by having the secret in ENV, then referencing it in the builder configuration:
```yaml
builder:
secrets:
- GITHUB_TOKEN
```
This build secret can then be referenced in the Dockerfile:
```
# Copy Gemfiles
COPY Gemfile Gemfile.lock ./
# Install dependencies, including private repositories via access token
Build arguments that aren't secret can also be configured:
```yaml
builder:
args:
RUBY_VERSION:3.2.0
```
This build argument can then be used in the Dockerfile:
```
# Private repositories need an access token during the build
ARG RUBY_VERSION
FROM ruby:$RUBY_VERSION-slim as base
```
### Using without RAILS_MASTER_KEY
If you're using MRSK with older Rails apps that predate RAILS_MASTER_KEY, or with a non-Rails app, you can skip the default usage and reference:
```yaml
skip_master_key:true
```
### Using accessories for database, cache, search services
You can manage your accessory services via MRSK as well. The services will build off public images, and will not be automatically updated when you deploy:
```yaml
accessories:
mysql:
image:mysql:5.7
host:1.1.1.3
port:3306
env:
clear:
MYSQL_ROOT_HOST:'%'
secret:
- MYSQL_ROOT_PASSWORD
volumes:
- /var/lib/mysql:/var/lib/mysql
redis:
image:redis:latest
host:1.1.1.4
port:"36379:6379"
volumes:
- /var/lib/redis:/data
```
Now run `mrsk accessory start mysql` to start the MySQL server on 1.1.1.3. See `mrsk accessory` for all the commands possible.
You can run interactive commands, like a Rails console or a bash session, on a server (default is primary, use `--hosts` to connect to another):
```bash
# Starts a bash session in a new container made from the most recent app image
mrsk app exec -i bash
# Starts a bash session in the currently running container for the app
mrsk app exec -i --reuse bash
# Starts a Rails console in a new container made from the most recent app image
mrsk app exec -i 'bin/rails console'
```
### Running details to see state of containers
You can see the state of your servers by running `mrsk details`:
```
Traefik Host: 192.168.0.1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6195b2a28c81 traefik "/entrypoint.sh --pr…" 30 minutes ago Up 19 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp traefik
Traefik Host: 192.168.0.2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
de14a335d152 traefik "/entrypoint.sh --pr…" 30 minutes ago Up 19 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp traefik
App Host: 192.168.0.1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
badb1aa51db3 registry.digitalocean.com/user/app:6ef8a6a84c525b123c5245345a8483f86d05a123 "/rails/bin/docker-e…" 13 minutes ago Up 13 minutes 3000/tcp chat-6ef8a6a84c525b123c5245345a8483f86d05a123
App Host: 192.168.0.2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d3c91ed1f55 registry.digitalocean.com/user/app:6ef8a6a84c525b123c5245345a8483f86d05a123 "/rails/bin/docker-e…" 13 minutes ago Up 13 minutes 3000/tcp chat-6ef8a6a84c525b123c5245345a8483f86d05a123
```
You can also see just info for app containers with `mrsk app details` or just for Traefik with `mrsk traefik details`.
### Running rollback to fix a bad deploy
If you've discovered a bad deploy, you can quickly rollback by reactivating the old, paused container image. You can see what old containers are available for rollback by running `mrsk app containers`. It'll give you a presentation similar to `mrsk app details`, but include all the old containers as well. Showing something like this:
```
App Host: 192.168.0.1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d3c91ed1f51 registry.digitalocean.com/user/app:6ef8a6a84c525b123c5245345a8483f86d05a123 "/rails/bin/docker-e…" 19 minutes ago Up 19 minutes 3000/tcp chat-6ef8a6a84c525b123c5245345a8483f86d05a123
539f26b28369 registry.digitalocean.com/user/app:e5d9d7c2b898289dfbc5f7f1334140d984eedae4 "/rails/bin/docker-e…" 31 minutes ago Exited (1) 27 minutes ago chat-e5d9d7c2b898289dfbc5f7f1334140d984eedae4
App Host: 192.168.0.2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
badb1aa51db4 registry.digitalocean.com/user/app:6ef8a6a84c525b123c5245345a8483f86d05a123 "/rails/bin/docker-e…" 19 minutes ago Up 19 minutes 3000/tcp chat-6ef8a6a84c525b123c5245345a8483f86d05a123
6f170d1172ae registry.digitalocean.com/user/app:e5d9d7c2b898289dfbc5f7f1334140d984eedae4 "/rails/bin/docker-e…" 31 minutes ago Exited (1) 27 minutes ago chat-e5d9d7c2b898289dfbc5f7f1334140d984eedae4
```
From the example above, we can see that `e5d9d7c2b898289dfbc5f7f1334140d984eedae4` was the last version, so it's available as a rollback target. We can perform this rollback by running `mrsk rollback e5d9d7c2b898289dfbc5f7f1334140d984eedae4`. That'll stop `6ef8a6a84c525b123c5245345a8483f86d05a123` and then start `e5d9d7c2b898289dfbc5f7f1334140d984eedae4`. Because the old container is still available, this is very quick. Nothing to download from the registry.
Note that by default old containers are pruned after 3 days when you run `mrsk deploy`.
### Running removal to clean up servers
If you wish to remove the entire application, including Traefik, containers, images, and registry session, you can run `mrsk remove`. This will leave the servers clean.
## Stage of development
This is alpha software. Lots of stuff is missing. Lots of stuff will keep moving around for a while.
Please help us improve Kamal's documentation on the [the basecamp/kamal-site repository](https://github.com/basecamp/kamal-site).
## License
MRSK is released under the [MIT License](https://opensource.org/licenses/MIT).
Kamal is released under the [MIT License](https://opensource.org/licenses/MIT).
raiseKamal::Cli::Build::BuildError,"Clone in #{KAMAL.config.builder.build_directory} is not on the correct revision, expected `#{Kamal::Git.revision}` but got `#{revision}`"
end
rescueSSHKit::Command::Failed=>e
raiseKamal::Cli::Build::BuildError,"Failed to validate clone: #{e.message}"
raise"Docker is not installed on #{missing.join(", ")} and can't be automatically installed without having root access and either `wget` or `curl`. Install Docker manually: https://docs.docker.com/engine/install/"
raiseKamal::ConfigurationError,"No servers specified for the #{primary_role.name} primary_role"
end
unlessallow_empty_roles?
roles.eachdo|role|
ifrole.hosts.empty?
raiseKamal::ConfigurationError,"No servers specified for the #{role.name} role. You can ignore this with allow_empty_roles: true"
end
end
end
true
end
defensure_valid_service_name
raiseKamal::ConfigurationError,"Service name can only include alphanumeric characters, hyphens, and underscores"unlessraw_config[:service]=~/^[a-z0-9_-]+$/i
# The default registry is Docker Hub, but you can change it using `registry/server`.
#
# A reference to a secret (in this case, `DOCKER_REGISTRY_TOKEN`) will look up the secret
# in the local environment:
registry:
server:registry.digitalocean.com
username:
- DOCKER_REGISTRY_TOKEN
password:
- DOCKER_REGISTRY_TOKEN
# Using AWS ECR as the container registry
#
# You will need to have the AWS CLI installed locally for this to work.
# AWS ECR’s access token is only valid for 12 hours. In order to avoid having to manually regenerate the token every time, you can use ERB in the `deploy.yml` file to shell out to the AWS CLI command and obtain the token:
registry:
server:<your aws account id>.dkr.ecr.<your aws region id>.amazonaws.com
username:AWS
password:<%= %x(aws ecr get-login-password) %>
# Using GCP Artifact Registry as the container registry
#
# To sign into Artifact Registry, you need to
# [create a service account](https://cloud.google.com/iam/docs/service-accounts-create#creating)
# and [set up roles and permissions](https://cloud.google.com/artifact-registry/docs/access-control#permissions).
# Normally, assigning the `roles/artifactregistry.writer` role should be sufficient.
#
# Once the service account is ready, you need to generate and download a JSON key and base64 encode it:
#
# ```shell
# base64 -i /path/to/key.json | tr -d "\\n"
# ```
#
# You'll then need to set the `KAMAL_REGISTRY_PASSWORD` secret to that value.
#
# Use the environment variable as the password along with `_json_key_base64` as the username.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.