The pick_first policy creates a list of subchannels for each resolver update and then iterates over the list, attempting to connect to each subchannel in turn, until one of them succeeds. However, once a subchannel does succeed, the policy unrefs the other subchannels but still retains a bunch of now-unnecessary state in the subchannel list itself. This wastes a bunch of memory, especially now that petiole policies are delegating to pick_first. This PR contains a new pick_first implementation that stops retaining that state, which significantly reduces per-channel memory.
There is one behavior change here, which is that if we have a connected subchannel and we get a resolver update that no longer includes that address, we now go IDLE instead of proactively trying to connect to the new addresses.
Closes#34766
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/34766 from markdroth:pick_first_free_memory_after_connecting 7236b4321f
PiperOrigin-RevId: 623887639
[grpc][Gpr_To_Absl_Logging] Migrating from gpr to absl logging GPR_ASSERT
Replacing GPR_ASSERT with absl CHECK
Will not be replacing CHECK with CHECK_EQ , CHECK_NE etc because there are too many callsites. Only a few - which fit into single - line regex will be changed. This would be small in number just to reduce the load later.
Replacing CHECK with CHECK_EQ , CHECK_NE etc could be done using Cider-V once these changes are submitted if we want to clean up later. Given that we have 5000+ instances of GPR_ASSERT to edit, Doing it manually is too much work for both the author and reviewer.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#36224
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36224 from tanvi-jagtap:assert_end2end_general 0b0e940f5d
PiperOrigin-RevId: 623410297
See #36176. The only difference is a temporary shim for Secure credentials types, which was already discussed and approved separately.
Closes#36242
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36242 from drfloob:reland/36176 f07bebe289
PiperOrigin-RevId: 621879911
Forked from #35957
This PR refactors the credentials types to remove Secure and Insecure Channel and Call credentials types. We standardize on a `c_creds()` accessor method for all credentials types, which can now be treated uniformly. This notably removes special-case handling of insecure credentials.
The special code-paths for insecure creds are no longer necessary in the wake of #25586.
Closes#36176
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36176 from drfloob:fork/35957/creds-API fd64d59c23
PiperOrigin-RevId: 621008166
This new directory combines code from the following locations:
- src/core/ext/filters/client_channel/resolver
- src/core/lib/resolver
Closes#35804
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35804 from markdroth:client_channel_resolver_reorg2 30660e6b00
PiperOrigin-RevId: 604665835
Changes to fake resolver:
- Add `WaitForReresolutionRequest()` method to fake resolver response
generator to allow tests to tell when re-resolution has been requested.
- Change fake resolver response generator API to have only one mechanism
for injecting results, regardless of whether the result is an error or
whether it's triggered by a re-resolution.
Changes to grpclb_end2end_test:
- Change balancer interface such that instead of setting a list of
responses with fixed delays, the test can control exactly when each
response is set.
- Change balancer impl to always send the initial LB response, as
expected by the grpclb protocol.
- Change balancer impl to always read load reports, even if load
reporting is not expected to be enabled. (The latter case will still
cause the test to fail.) Reads are done in a different thread than
writes.
- Allow each test to directly control how many backends and balancers
are started and the client load reporting interval, so that (a) we don't
waste resources starting servers we don't need and (b) there is no need
to arbitrarily split tests across different test classes.
- Add timeouts to `WaitForAllBackends()` functionality, so that tests
will fail with a useful error rather than timing out.
- Improved ergonomics of various helper functions in the test framework.
In the process of making these changes, I found a couple of bugs:
- A bug in pick_first, which I fixed in #34885.
- A bug in grpclb, in which we were using the wrong condition to decide
whether to propagate a re-resolution request from the child policy,
which I've fixed in this PR. (This bug probably originated way back in
#18344.)
This should address a lot of the flakes seen in grpclb_e2e_test
recently.
More changes as part of the dualstack design:
- Change resolver and LB policy APIs to support multiple addresses per
endpoint. Specifically, replace `ServerAddress` with
`EndpointAddresses`, which encodes more than one address. Per-address
channel args are retained at the same level, so they are now
per-endpoint. For now, `EndpointAddress` provides a single-address ctor
and a single-address accessor for backward compatibility, so
`ServerAdress` is an alias for `EndpointAddresses`; eventually, this
alias and the single-address methods will be removed.
- Add an `EndpointAddressSet` class, which represents an unordered set
of addresses to be used as a map key. This will be used in a number of
LB policies that need to store per-endpoint state.
- Change the LB policy API's `ChannelControlHelper::CreateSubchannel()`
method to take the address and per-endpoint channel args as separate
parameters, so that we don't need to construct a legacy `ServerAddress`
object as we create a new subchannel for each address in the endpoint.
- Change pick_first to flatten the address list.
- Change ring_hash to use `EndpointAddressSet` as the key for its
endpoint map, and to use the first address of the endpoint as the hash
key.
- Change WRR to use `EndpointAddressSet` as the key for its endpoint
weight map.
Note that support for multiple addresses per endpoint is guarded in RR
by the existing `round_robin_delegate_to_pick_fist` experiment and in
WRR by the existing `wrr_delegate_to_pick_first` experiment.
This PR does *not* include support for multiple addresses per endpoint
for the outlier_detection or xds_override_host LB policies; those will
come in subsequent PRs.
This test assumed synchronous work serializer execution (or at least
faster async than we always get)... make a trivial change to keep the
test semantics but allow for the implementation to be more async.
Most recent attempt was #34320, reverted in #34335.
The first commit here is a pure revert. The second commit fixes the
outlier_detection unit test to pass both with and without the
experiment.
Rolls forward part of the dualstack changes, mostly from #33427 and a
little bit from #32692, both of which were reverted in #33718.
Specifically:
- For petiole policies, unconditionally start health watch on
subchannels, even if client side health checking is not enabled; in this
case, the health watch will report the subchannel's raw connectivity
state.
- Fix edge cases in health check reporting that occur when a watcher is
started before the initial state is reported.
- When client-side health checking fails, add the subchannel's address
to the RPC failure status message.
- Outlier detection now works only via the health checking watch, not
via the raw connectivity state watch.
- Remove now-unnecessary hack to ensure that outlier detection does not
work for pick_first.
CNR the flake, but I've changed the test (which is very old) to use some
of our more modern helper functions that have saner timeouts.
Also re-add a `return` statement that was accidentally removed in
#33753, which I noticed while working on this. Its absence doesn't cause
a real problem, but it does cause us to needlessly trigger a duplicate
connection attempt or report a duplicate CONNECTING update in some
cases.
Fix sticky-TF behavior such that once we enter TRANSIENT_FAILURE, we do
not leave that state if we get a new address list.
Also, fix handling of subchannels in state TRANSIENT_FAILURE.
Previously, if a subchannel was already in state TRANSIENT_FAILURE when
we wanted to start a connection attempt on it (e.g., because the
subchannel already existed from a different channel, or because it
already existed in the previous subchannel list), we would wait for it
to report IDLE before attempting to connect. This PR changes pick_first
to instead immediately skip the subchannel and move on to the next one.
Now, the only time we wait for a subchannel in TRANSIENT_FAILURE is when
we wrap back around to the first subchannel in the list.
I'm fairly certain that this path should be non-blocking (and making it
so makes the promise based code far more tractable).
This moves the blocking behavior into the blocking server_cc.cc function
that calls `grpc_server_shutdown_and_notify` instead of in that
non-blocking function.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This reverts the following PRs: #32692#33087#33093#33427#33568
These changes seem to have introduced some flaky crashes. Reverting
while I investigate.
As per gRFC A58, when WRR sees a subchannel report READY, it reset the
non_empty_since value, thus restarting the blackout period. However,
there were two cases where we were incorrectly triggering this code:
1. When WRR got an updated address list that contained addresses that
were already present on the old list and whose subchannels were already
in READY state, the initial notification for those subchannels on the
new list was READY, which incorrectly triggered resetting the
non_empty_since value.
2. Due to a bug in the outlier_detection policy, whenever an update was
propagated down through the OD policy without actually enabling OD, it
would incorrectly send a duplicate connectivity state notification for
the subchannels. This meant that a subchannel that was already in state
READY would report READY again, which would also incorrectly trigger
resetting the non_empty_since value.
This PR makes two changes:
1. Fix the bug in outlier_detection that caused it to generate the
spurious duplicate READY updates.
2. Fix WRR to reset the non_empty_since value when a subchannel goes
READY only if the subchannel has seen a previous state update and only
if that previous state was not READY. (The duplicate READY notifications
should not actually happen anymore now that the OD policy has been
fixed, but better to be defensive.)
Fixes b/290983884.
More work on the dualstack backend design:
- Now that all petiole policies have been changed to delegate to
pick_first, outlier detection no longer needs to eject via the
subchannel's raw connectivity state; it can now eject only via the
health state. See #33340.
- This also removes the now-unnecessary hack to explicitly disable
outlier detection in pick_first. See #33336.
More work on the dualstack backend design:
- Change round_robin to delegate to pick_first instead of creating
subchannels directly.
- Change pick_first such that when it is the child of a petiole policy,
it will unconditionally start a health watch.
- Change the client-side health checking code such that if client-side
health checking is not enabled, it will return the subchannel's raw
connectivity state.
- As part of this, we introduce a new endpoint_list library to be used
by petiole policies, which is intended to replace the existing
subchannel_list library. The only policy that will still directly
interact with subchannels is pick_first, so the relevant parts of the
subchannel_list functionality have been copied directly into that
policy. The subchannel_list library will be removed after all petiole
policies are updated to delegate to pick_first.
The address attribute interface was intended to provide a mechanism to
pass attributes separately from channel args, for values that do not
affect subchannel behavior and therefore do not need to be present in
the subchannel key, which does include channel args. However, the
mechanism as currently designed is fairly clunky and is probably not the
direction we will want to go in the long term.
Eventually, we will want some mechanism for registering channel args,
which would provide a cleaner way to indicate that a given channel arg
should not be used in the subchannel key, so that we don't need a
completely different mechanism. For now, this PR is just doing an
interim step, which is to establish a special channel arg key prefix to
indicate that an arg is not needed in the subchannel key.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Reverts grpc/grpc#33002. Breaks internal builds:
`.../privacy_context:filters does not depend on a module exporting
'.../src/core/lib/channel/context.h'`
Change call attributes to be stored in a `ChunkedVector` instead of
`std::map<>`, so that the storage can be allocated on the arena. This
means that we're now doing a linear search instead of a map lookup, but
the total number of attributes is expected to be low enough that that
should be okay.
Also, we now hide the actual data structure inside of the
`ServiceConfigCallData` object, which required some changes to the
`ConfigSelector` API. Previously, the `ConfigSelector` would return a
`CallConfig` struct, and the client channel would then use the data in
that struct to populate the `ServiceConfigCallData`. This PR changes
that such that the client channel creates the `ServiceConfigCallData`
before invoking the `ConfigSelector`, and it passes the
`ServiceConfigCallData` into the `ConfigSelector` so that the
`ConfigSelector` can populate it directly.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This reverts commit 7bd9267f32.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->