upstream: handle health check fail after removal (#6765)

When using active health checking, hosts are not removed from dynamic clusters if they are still passing health checks. This creates a situation in which hosts might not be removed for a very long time if the sequence is reversed; removal followed by health check failure. This change handles the second case so that any time a host is both removed AND failing active health check, in any order, it will be removed. This has been an issue "forever" but is more obvious when using streaming EDS or very long polling DNS. Fixes https://github.com/envoyproxy/envoy/issues/6625 Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 41eefffcd728d071037a57a1accd402ec188bcd5
6 years ago · 429644f1b4
parent 1e6a3ddd6e
commit 429644f1b4
1 changed files with 4 additions and 0 deletions
--- a/envoy/admin/v2alpha/clusters.proto
+++ b/envoy/admin/v2alpha/clusters.proto
@ -78,6 +78,10 @@ message HostHealthStatus {
  // The host is currently being marked as degraded through active health checking.
  bool failed_active_degraded_check = 4;

+  // The host has been removed from service discovery, but is being stabilized due to active
+  // health checking.
+  bool pending_dynamic_removal = 5;
+
  // Health status as reported by EDS. Note: only HEALTHY and UNHEALTHY are currently supported
  // here.
  // TODO(mrice32): pipe through remaining EDS health status possibilities.