Zero downtime deployment with cord file

When replacing a container currently we: 1. Boot the new container 2. Wait for it to become healthy 3. Stop the old container Traefik will send requests to the old container until it notices that it is unhealthy. But it may have stopped serving requests before that point which can result in errors. To get round that the new boot process is: 1. Create a directory with a single file on the host 2. Boot the new container, mounting the cord file into /tmp and including a check for the file in the docker healthcheck 3. Wait for it to become healthy 4. Delete the healthcheck file ("cut the cord") for the old container 5. Wait for it to become unhealthy and give Traefik a couple of seconds to notice 6. Stop the old container The extra steps ensure that Traefik stops sending requests before the old container is shutdown.
2023-08-31 10:21:57 +01:00
parent 94bf090657
commit 8a41d15b69
17 changed files with 234 additions and 65 deletions
--- a/test/cli/healthcheck_test.rb
+++ b/test/cli/healthcheck_test.rb
@@ -6,6 +6,7 @@ class CliHealthcheckTest < CliTestCase
    Thread.report_on_exception = false

    Kamal::Utils::HealthcheckPoller.stubs(:sleep) # No sleeping when retrying
+    Kamal::Configuration.any_instance.stubs(:run_id).returns("12345678901234567890123456789012")

    SSHKit::Backend::Abstract.any_instance.stubs(:execute)
      .with(:docker, :container, :ls, "--all", "--filter", "name=^healthcheck-app-999$", "--quiet", "|", :xargs, :docker, :stop, raise_on_non_zero_exit: false)