retry extensions: implement "other priority" extension (#4529)

Implements a RetryPriority which will keep track of attempted priorities and attempt to route retry requests to other priorities. The update frequency is configurable, allowing multiple requests to hit each priority if desired. As a fallback, when no healthy priorities remain, the list of attempted priorities will be reset and a host will selected again using the original priority load. Extracts out the recalculatePerPriorityState from LoadBalancerBase to recompute the priority load with the same code used by the LB. Signed-off-by: Snow Pettersen snowp@squareup.com Risk Level: Medium, new extension Testing: unit tests Docs Changes: n/a Release Notes: n/a Signed-off-by: Snow Pettersen <snowp@squareup.com> Mirrored from https://github.com/envoyproxy/envoy @ ba5d3f0c130bb21958cf093c368af0526a4740b7
7 years ago · 5df3f67fa1
parent e288c01703
commit 5df3f67fa1
2 changed files with 48 additions and 0 deletions
--- a/envoy/config/retry/other_priority/BUILD
+++ b/envoy/config/retry/other_priority/BUILD
@ -0,0 +1,11 @@
+licenses(["notice"])  # Apache 2
+
+load("//bazel:api_build_system.bzl", "api_proto_library_internal")
+
+api_proto_library_internal(
+    name = "other_priority",
+    srcs = ["other_priority_config.proto"],
+    deps = [
+        "//envoy/api/v2/core:base",
+    ],
+)
--- a/envoy/config/retry/other_priority/other_priority_config.proto
+++ b/envoy/config/retry/other_priority/other_priority_config.proto
@ -0,0 +1,37 @@
+syntax = "proto3";
+
+package envoy.config.retry.other_priority;
+
+// A retry host selector that attempts to spread retries between priorities, even if certain
+// priorities would not normally be attempted due to higher priorities being available.
+//
+// As priorities get excluded, load will be distributed amongst the remaining healthy priorities
+// based on the relative health of the priorities, matching how load is distributed during regular
+// host selection. For example, given priority healths of {100, 50, 50}, the original load will be
+// {100, 0, 0} (since P0 has capacity to handle 100% of the traffic). If P0 is excluded, the load
+// changes to {0, 50, 50}, because P1 is only able to handle 50% of the traffic, causing the
+// remaining to spill over to P2.
+//
+// Each priority attempted will be excluded until there are no healthy priorities left, at which
+// point the list of attempted priorities will be reset, essentially starting from the beginning.
+// For example, given three priorities P0, P1, P2 with healthy % of 100, 0 and 50 respectively, the
+// following sequence of priorities would be selected (assuming update_frequency = 1):
+// Attempt 1: P0 (P0 is 100% healthy)
+// Attempt 2: P2 (P0 already attempted, P2 only healthy priority)
+// Attempt 3: P0 (no healthy priorities, reset)
+// Attempt 4: P2
+//
+// Using this PriorityFilter requires rebuilding the priority load, which runs in O(# of
+// priorities), which might incur significant overhead for clusters with many priorities.
+message OtherPriorityConfig {
+  // How often the priority load should be updated based on previously attempted priorities. Useful
+  // to allow each priorities to receive more than one request before being excluded or to reduce
+  // the number of times that the priority load has to be recomputed.
+  //
+  // For example, by setting this to 2, then the first two attempts (initial attempt and first
+  // retry) will use the unmodified priority load. The third and fourth attempt will use priority
+  // load which excludes the priorities routed to with the first two attempts, and the fifth and
+  // sixth attempt will use the priority load excluding the priorities used for the first four
+  // attempts.
+  int32 update_frequency = 1;
+}