retry: handle all unhealthy upstreams in other_priority plugin (#4838)

This fixes a bug in the other priority plugin that would cause a crash when retries were attempted when the upstream had no healthy hosts. The existing check for no healthy was ineffective due to the "everything is terrible" fallback in the LoadBalancerBase which sets P0 to 100 when all the priorities are unhealthy. The fix is to check for healthy % based on the loads computed in the plugin, not the ones returned by LoadBalancerBase. When all hosts are unhealthy, we return the original priority load. This ensures that we maintain whatever fallback the default LB uses when there are no unhealthy hosts. Signed-off-by: Snow Pettersen snowp@squareup.com Risk Level: Medium Testing: Added regression test for no unhealthy hosts Docs Changes: n/a Release Notes: n/a Signed-off-by: Snow Pettersen <snowp@squareup.com> Mirrored from https://github.com/envoyproxy/envoy @ 59816a486c64cd05e9e0c0f08194b121690d6632
6 years ago · 981878b844
parent 57012d4c64
commit 981878b844
1 changed files with 3 additions and 0 deletions
--- a/envoy/config/retry/other_priority/other_priority_config.proto
+++ b/envoy/config/retry/other_priority/other_priority_config.proto
@ -21,6 +21,9 @@ package envoy.config.retry.other_priority;
 // Attempt 3: P0 (no healthy priorities, reset)
 // Attempt 4: P2
 //
+// In the case of all upstream hosts being unhealthy, no adjustments will be made to the original
+// priority load, so behavior should be identical to not using this plugin.
+//
 // Using this PriorityFilter requires rebuilding the priority load, which runs in O(# of
 // priorities), which might incur significant overhead for clusters with many priorities.
 message OtherPriorityConfig {