Depending on the execution environment in which envoy is being run, it
is possible that some of the assumption on the clock are maybe not
holding as previously commented. With some sandboxing technologies the
clock does not reference the machine boot time but the sandbox boot
time. This invalidates the assumtpion that the first update in the
cluster_manager will most likely fall out of the windows and ends up
showing a non intuitive behavior difficult to catch.
This PR simply adds a comment that will allow the reader to consider
this option while reading to the code.
Signed-off-by: Flavio Crisciani <f.crisciani@gmail.com>
Mirrored from https://github.com/envoyproxy/envoy @ 640b5a436d2ce8e637d28225d5b4f0aae307dede
Due to a seg fault issue with the gogo protobuf library
[https://github.com/gogo/protobuf/issues/568], non nullable repeated
fields in a proto will cause proto.Merge(dst, src) to panic.
The nullable field setting was first added by @kyessenov when he was
re-organizing the protos. Unfortunately, people have been copy pasting it
across several areas in the Envoy proto. To keep the impact radius to a minimum,
I have updated only the fields that are currently causing the segfault
(in go-control-plane) for us.
Its also partly against proto principles. You should be able to determine if
a field is set or not. This non-nullable setting in gogo will insist on initializing
the field to default values.
Risk Level: to go control plane users
Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
Mirrored from https://github.com/envoyproxy/envoy @ b22d2b5cf09f779962cfedaaab24969f384cbc48
Currently, in load_balancer_impl.cc: recalculatePerPriorityPanic(), even if common_lb_config.healthy_panic_threshold is 0%, a load balancer enters panic mode whenever normalized_total_availability is 0%.
I guess, a user who intentionally set to healthy_panic_threshold = 0 expects to immediately return error responses if there is no available host checked by a load balancer.
(In fact, current load_balancer_impl.cc: isGlobalPanic() decide not to enter panic mode whenever healthy_panic_threshold is 0%.)
So I suggest that panic mode is disabled only when healthy_panic_threshold is 0%.
I want this change for automatic degenerating lower priority or optional back-end services.
Risk Level: Low
(It seems that setting healthy_panic_threshold == 0 is a special case originally. It won't happen unless a user intend to disable panic mode, because default value of healthy_panic_threshold is 50%.)
Testing: unit and Integration tests with ./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.release'
Docs Changes: inline
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
Mirrored from https://github.com/envoyproxy/envoy @ ad9926fb29dc047d7583ab5b0217a3c379f14200
This makes it possible to configure the subset LB to match metadata
match criterias with any of the values specified in a list value on an
endpoint. This allows endpoints to have multiple values for a given
metadata key.
To accomplish this the invariants of the subset trie construction
changed: a host can now be associated with multiple subsets for a set of
subset selectors. To support this the trie construction had to change to
traverse all possible paths for each host.
Fixes#6921
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Mirrored from https://github.com/envoyproxy/envoy @ 41ecbb3e4bd48b425483e7c3aae17509f2ef3a80
This code change allows to redefine fallback policy per specific subset selector. Because of how existing LbSubsetMap trie data structure is organised (mapping subset key to values), is not possible to do lookups for fallback policy only based on subset keys (had to introduce additional trie that maps subset keys to keys and has fallback policy on leaf level). Additional LbSubsetSelectorFallbackPolicy enum required to correctly identify the case when fallback policy is not set for given selector (otherwise it would always default to NO_FALLBACK, breaking backwards compatibility, if field is not set we should use top level fallback policy instead).
Risk Level: Medium
Testing: Done
Docs Changes: Updated related docs
Release Notes: added
Fixes#5130
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Mirrored from https://github.com/envoyproxy/envoy @ 1a60b343665cf2ffb966f37bbe48fed21805df57
Certain clusters have cluster specific load balancers. This change
allows a cluster to explicitly provide one, both allowing extension
clusters to easily provide a dedicated load balancer, as well as
allowing for future cleanup of the original DST LB configuration.
This change is needed for Redis Cluster as well as
#1606.
Risk Level: Low
Testing: New UTs and integration tests.
Signed-off-by: Matt Klein <mklein@lyft.com>
Mirrored from https://github.com/envoyproxy/envoy @ fcf8a5918dfd20ed4ff52652f9ffd7d1d9d34b28
This adds an option to allow hosts to be excluded in lb calculations until they have been health checked
for the first time. This will make it possible to scale up the number of hosts quickly (ie large increase
relative to current host set size) without triggering panic mode/spillover (as long as the initial health check
is succeeds).
While these hosts are excluded from the lb calculations, they are still eligible for routing when panic
mode is triggered.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Mirrored from https://github.com/envoyproxy/envoy @ 4c80194bf82193261aa52a4ca64c4e6a461881c0
Moved DEPRECATED.md to sphinx docs.
Risk Level: Low - only documentation
Testing: Compiles with sphinx docs without warnings or errors
Docs Changes: deprecated.rst created in intro section of sphinx docs and added to toctree
Release Notes: N/A
Fixes: #6386
Signed-off-by: HashedDan <georgedanielmangum@gmail.com>
Mirrored from https://github.com/envoyproxy/envoy @ fd7c172af181275693297efbe148fd8bb414ef48
Using proto.MarshalAny results in unstable output due to non-deterministic map ordering. This in turn causes Envoy's diff to reload a config since the hash of the structure changes.
Enable stable marshaler for gogoproto to avoid this problem. See #6252
Risk Level: low
Testing: n/a
Signed-off-by: Kuat Yessenov <kuat@google.com>
Mirrored from https://github.com/envoyproxy/envoy @ 15a19b9cb1cc8bd5a5ec71d125177b3f6c9a3cf5
Description: Code for Envoy to speak delta CDS with a management server. DELTA_GRPC added to config_source.proto's ApiTypes, to allow bootstrap configs to ask for incremental xDS.
Part of #4991. Was #5466; giving up on broken DCO craziness.
Risk Level: medium
Testing: new integration test
Signed-off-by: Fred Douglas <fredlas@google.com>
Mirrored from https://github.com/envoyproxy/envoy @ 8116b6ddb53409c3cea6f55fb00367aa43d7e845
We use the default subset as our fallback policy. However, if there's
a problem with our metadata it's possible for the default subset
to become empty. In this case, we'd like any host to be selected.
This change adds a panic_mode_any boolean option to LbSubsetConfig
to enable this behavior.
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Mirrored from https://github.com/envoyproxy/envoy @ 718682f22bdb548624df076813208660fb928188
Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13327
For ring hash lb, move all configuration parsing from Ring's ctor to the LB's ctor, where it's safe to throw exceptions from. Also, re-add proto field constraints to guard against extreme inputs from clusterfuzz and other actors of questionable intent.
Risk Level: Low
Testing: Added clusterfuzz testcase; existing tests still pass.
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Mirrored from https://github.com/envoyproxy/envoy @ 99696cda3336af26fe0b048d91e0d6eb279bb81c
Add support of Any as opaque config for extensions. Deprecates Struct configs. Fixes#4475.
Risk Level: Low
Testing: CI
Docs Changes: Added.
Release Notes: Added.
Signed-off-by: Lizan Zhou <lizan@tetrate.io>
Mirrored from https://github.com/envoyproxy/envoy @ 851f591f4ed84594e5e5041e7ada4167a4f3a273
* api: add proto options for java
* add ci for checking proto options
Signed-off-by: Penn (Dapeng) Zhang <zdapeng@google.com>
Mirrored from https://github.com/envoyproxy/envoy @ 02659d411332e9f20d229f482931c15304ea17fd
Signed-off-by: Daniel Hochman <danielhochman@users.noreply.github.com>
Mirrored from https://github.com/envoyproxy/envoy @ 9388c05db87ddaf1c2afe45d9b72ec4d8601f380
Adds an option that allows smoothing out the locality weights when used
with subset lb: weights are scaled by the number of hosts that were
removed due to the subset filter. This allows for less of a step
function when a small number of hosts in a locality are updated to be
included in the subset.
Fixes#4837
Signed-off-by: Snow Pettersen snowp@squareup.com
Risk Level: Low, new optional feature
Testing: UTs
Docs Changes: Inline in protos
Release Notes: added release note
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Mirrored from https://github.com/envoyproxy/envoy @ b094017026ce57d52762513bd5c0f774da0fc39e
We've been using this in production for over 3 months now and it's
been very useful to prevent CPU spikes when we get a stream of
updates.
This enables update merging every 1s.
Fixes#4018.
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Mirrored from https://github.com/envoyproxy/envoy @ fad993e5aed40fba95897e9017afd19bdf170ed0
API for #4475.
Risk Level: Low (not implemented)
Testing: CI
Docs Changes: Added but hided
Release Notes: N/A, will add when adding impl.
Signed-off-by: Lizan Zhou <lizan@tetrate.io>
Mirrored from https://github.com/envoyproxy/envoy @ 45a460fabf34698a875060482de96f7f618bdc9f
In preparation for removing std::hash for LB (a deprecated v1 option)
Risk Level: Medium (changing existing code where default config was used)
Testing: Tweaked existing unit tests
Docs Changes: updated API docs
Release Notes: Noted in release notes
Deprecated*: std::hash in LB (already deprecated, but might as well get the bugs auto-filed for 1.9.0)
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Mirrored from https://github.com/envoyproxy/envoy @ db793ca15cfa9a500e76172c1011fa7baa4327ef
This change introduces a new configuration parameter for clusters,
`time_between_updates`.
If this is set, all cluster updates — membership changes, metadata
changes on endpoints, or healtcheck state changes — that happen
within a `time_between_updates` duration will be merged and delivered
at once when the duration ends.
This is useful for big clusters (> 1k endpoints) using the subset LB.
Partially addresses https://github.com/envoyproxy/envoy/issues/3929.
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Mirrored from https://github.com/envoyproxy/envoy @ 62441f9fe2933d4846cfc3d166e52a9178be755f
This patch implements load_assigment field in CDS' Cluster.
This change specifically adds the implementation of the new load_assigment field
for clusters with discovery-type: STATIC, STRICT_DNS and LOGICAL_DNS.
While adding this load_assigment field implementation to Cluster,
this patch also allows specifying optional (active) health check config per specified upstream host.
Risk Level: medium
Testing: unit tests
Docs Changes:
This unhides docs for endpoint health check config
Release Notes: N/A
Fixes#439
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Mirrored from https://github.com/envoyproxy/envoy @ b32eabfc141760ec14622a4a2a2f0ab0a741cd6c
Adds a flag to the subset lb config which will cause the subset
lb to respect the locality weights of the original host set. This allows
subset matching on clusters that are configured with locality weights.
Risk Level: Low, new optional feature
Testing: Unit tests
Docs Changes: Added documentation to proto definition
Release Notes: Added note about new configuration option to release notes
Fixes#3123
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Mirrored from https://github.com/envoyproxy/envoy @ 0a0914e9a1c18c94dc7a02faac1e8b1903d4e2e5
Add load_assignment field in Cluster
This patch introduces load_assigment field in CDS' Cluster. This is an API change only.
This is part of effort on breaking #3261 into multiple PRs.
Risk Level:
Low, since it is hidden.
Testing:
Build api and envoy-static without error
Docs Changes:
Add load_assignment in Cluster of cds.proto.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Mirrored from https://github.com/envoyproxy/envoy @ 79bce5fe1cd8d1ab03dc6085497fcda653320a67
This PR adds a configuration flag that allows disabling the "eventually consistent" aspect of endpoint updates: instead of waiting for the endpoints to go unhealthy before removing them from the cluster, do so immediately. This gives greater control to the control plane in cases where one might want to divert traffic from an endpoint
without having it go unhealthy. The flag goes on the cluster and so applies to all endpoints within that cluster.
Risk Level:
Low: Small configurable feature which reuses existing behavior (identical to the behavior when no health checker is configured). Defaults to disabled, so should have no impact on existing configurations.
Testing:
Added unit test for the case when endpoints are healthy then removed from the ClusterLoadAssignment in a subsequent config update.
Docs Changes:
Docs were added to the proto field.
Release Notes:
Added cluster: Add :ref:`option <envoy_api_field_Clister.drain_connections_on_eds_removal>` to drain endpoints after they are removed through EDS, despite health status. to the release notes.
[Optional Fixes #Issue]
#440 and #3276 (note that this last issue also asks for more fine grained control over endpoint removal. The solution this PR provides was brought up as a partial solution to #3276).
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Mirrored from https://github.com/envoyproxy/envoy @ 08712e93b07695f53d192a2601cfa2ccc7a20f33
* cluster: Add option to close tcp_proxy connections when health checks fail.
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Mirrored from https://github.com/envoyproxy/envoy @ 908231ed28d4f619e24c8c46a837cc3f914d173d
Adds TCP Keepalive support for upstream connections. This can be configured on the cluster manager level, and overridden on the cluster level.
Risk Level: Medium
Testing:
Unit tests have been added. It appears to run and work.
Docs Changes:
envoyproxy/data-plane-api#614Fixesenvoyproxy/envoy#3028
API Changes:
envoyproxy/data-plane-api#614
Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Mirrored from https://github.com/envoyproxy/envoy @ dd953f99945bb7c6b3251f71bffe252a5f6e9e62
Found via proto fuzzing of the server config, unbounded ring sizes can lead to resource exhaustion.
Also bumped PGV version, since even with the bound added to cds.proto, the constraint validation was
skipped due to the bug fixed in https://github.com/lyft/protoc-gen-validate/pull/73.
Risk Level: Medium (PGV bump might result in some configs that passed before failing).
Testing: server_fuzz_test regression.
Signed-off-by: Harvey Tuch <htuch@google.com>
Mirrored from https://github.com/envoyproxy/envoy @ 176e565eaec82d79ebf28d3f2bd0493f68a95180
Introduce the concept of locality weighted LB (as distinct from zone
aware LB) in the docs and a new field in Cluster, locality_weighted_lb,
for configuring this behavior.
Signed-off-by: Harvey Tuch <htuch@google.com>