Remove the healthcheck step

To speed up deployments, we'll remove the healthcheck step.

This adds some risk to deployments for non-web roles - if they don't
have a Docker healthcheck configured then the only check we do is if
the container is running.

If there is a bad image we might see the container running before it
exits and deploy it. Previously the healthcheck step would have avoided
this by ensuring a web container could boot and serve traffic first.

To mitigate this, we'll add a deployment barrier. Until one of the
primary role containers passes its healthcheck, we'll keep the barrier
up and avoid stopping the containers on the non-primary roles.

It the primary role container fails its healthcheck, we'll close the
barrier and shut down the new containers on the waiting roles.

We also have a new integration test to check we correctly handle a
a broken image. This highlighted that SSHKit's default runner will
stop at the first error it encounters. We'll now have a custom runner
that waits for all threads to finish allowing them to clean up.
This commit is contained in:
Donal McBreen
2024-03-21 11:36:21 +00:00
parent 990f1b4413
commit 0efb5ccfff
24 changed files with 269 additions and 327 deletions

View File

@@ -78,6 +78,11 @@ class IntegrationTest < ActiveSupport::TestCase
latest_app_version
end
def break_app
deployer_exec "./break_app.sh #{@app}", workdir: "/"
latest_app_version
end
def latest_app_version
deployer_exec("git rev-parse HEAD", capture: true)
end
@@ -131,4 +136,16 @@ class IntegrationTest < ActiveSupport::TestCase
puts "Tried to get the response code again and got #{app_response.code}"
end
end
def assert_container_running(host:, name:)
assert container_running?(host: host, name: name)
end
def assert_container_not_running(host:, name:)
assert_not container_running?(host: host, name: name)
end
def container_running?(host:, name:)
docker_compose("exec #{host} docker ps --filter=name=#{name} | tail -n+2", capture: true).tap { |x| p [ x, x.strip, x.strip.present? ] }.strip.present?
end
end