Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
More changes as part of the dualstack design:
- Change resolver and LB policy APIs to support multiple addresses per
endpoint. Specifically, replace `ServerAddress` with
`EndpointAddresses`, which encodes more than one address. Per-address
channel args are retained at the same level, so they are now
per-endpoint. For now, `EndpointAddress` provides a single-address ctor
and a single-address accessor for backward compatibility, so
`ServerAdress` is an alias for `EndpointAddresses`; eventually, this
alias and the single-address methods will be removed.
- Add an `EndpointAddressSet` class, which represents an unordered set
of addresses to be used as a map key. This will be used in a number of
LB policies that need to store per-endpoint state.
- Change the LB policy API's `ChannelControlHelper::CreateSubchannel()`
method to take the address and per-endpoint channel args as separate
parameters, so that we don't need to construct a legacy `ServerAddress`
object as we create a new subchannel for each address in the endpoint.
- Change pick_first to flatten the address list.
- Change ring_hash to use `EndpointAddressSet` as the key for its
endpoint map, and to use the first address of the endpoint as the hash
key.
- Change WRR to use `EndpointAddressSet` as the key for its endpoint
weight map.
Note that support for multiple addresses per endpoint is guarded in RR
by the existing `round_robin_delegate_to_pick_fist` experiment and in
WRR by the existing `wrr_delegate_to_pick_first` experiment.
This PR does *not* include support for multiple addresses per endpoint
for the outlier_detection or xds_override_host LB policies; those will
come in subsequent PRs.
Waiting for an active channel takes less time in non-gamma test suites
because they only start waiting after already waited for the TD backends
to be created and report healthy.
In GAMMA, these resources are created asynchronously by Kubernetes. To
compensate for this, we double the timeout for GAMMA tests.
Really minimal change to make the output buffer for chttp2 be a
`grpc_core::SliceBuffer` so that we can start mixing in the new framer
code.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix: https://github.com/grpc/grpc/issues/33890
In case of abort, currently we don't log anything, this change added an
`AbortError` as the default error for abort, if any other error
happened, we'll log the error so user will be aware of the other error.
Related gRFC: https://github.com/grpc/proposal/pull/388
### Testing
* Tested locally, after this change, any non-AbortError will be logged,
sample log:
```
ERROR:grpc._server:Exception happened while abort: Other exception happened!
Traceback (most recent call last):
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/9/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests/unit/_abort_test.native.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/_server.py", line 553, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/9/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests/unit/_abort_test.native.runfiles/com_github_grpc_grpc/src/python/grpcio_tests/tests/unit/_abort_test.py", line 95, in abort_with_status_unary_unary_raise_additional_exception
raise Exception("Other exception happened!")
Exception: Other exception happened!
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This change is to update the TD bootstrap generator for prod tests. This
is part of the TD release process. The new image has already been merged
to staging and tested locally in google3.
cc: @sergiitk PTAL.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Fix a crash on older iOS versions due to problematic thread-local
variable initialization.
See https://github.com/firebase/firebase-ios-sdk/issues/11509
Basically, there appears to be a bug in Xcode where it generates
assembly for thread-local variable initialization that is susceptible to
a crash. For example, on arm64 the generated assembly relies on
registers like x8 and x10 being preserved by the thread-local variable
initialization routine; however, in some cases this thread-local
variable initialization calls functions like
`ImageLoaderMachOCompressed::doBindFastLazySymbol` which clobber these
registers, leaving their values indeterminate when the caller resumes.
When those indeterminate values are later used as memory addresses they
are invalid and result in a crash.
This PR works around this bug by removing the `ScopedTimeCache` member
variable from the `ExecCtx` class on iOS. This is a reasonable
workaround because `ScopedTimeCache` is only a slight optimization for
data centers that entirely doesn't matter for mobile.
See https://github.com/dconeybe/TlsCrashIos12 for a demo of this crash.
Googlers see b/300501963 for full details.
We disabled this a little while ago for lack of CI bandwidth, but #34404
ought to have freed up enough capacity that we can keep running this.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Expand our fuzzing capabilities by allowing fuzzers to choose the bits
that go into random number distribution generators.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Passed manual run:
https://fusion2.corp.google.com/invocations/0e337438-8573-4b54-ba1c-242b1ad58586
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
There's an ongoing discussion on whether we should have API to disable
default metrics. Removing this API till we have a decision on that.
I'm keeping the internal API for enabling/disabling metrics on the OTel
plugin for now, just not exposing it publicly.
Reverts grpc/grpc#34361
Not sure why CI didn't capture it and why my previous test works, but
looks like it broke at least src/objective-c/examples/SwiftSample build.
Will see how CI goes in this PR.
Summary -
On the server-side, we are changing the point at which we decide whether
a method is registered or not from the surface to the transport at the
point where we are done receiving initial metadata and before we invoke
the recv_initial_metadata_ready closures from the filters. The main
motivation for this is to allow filters to check whether the incoming
method is a registered or not. The exact use-case is for observability
where we only want to record the method if it is registered. We store
the information about the registered method in the initial metadata.
On the client-side, we also set information about whether the method is
registered or not in the outgoing initial metadata.
Since we are effectively changing the lookup point of the registered
method, there are slight concerns of this being a potentially breaking
change, so we are guarding this with an experiment to be safe.
Changes -
* Transport API changes -
* Along with `accept_stream_fn`, a new callback
`registered_method_matcher_cb` will be sent down as a transport op on
the server side. When initial metadata is received on the server side,
this callback is invoked. This happens before invoking the
`recv_initial_metadata_ready` closure.
* Metadata changes -
* We add a new non-serializable metadata trait `GrpcRegisteredMethod()`.
On the client-side, the value is a uintptr_t with a value of 1 if the
call has a registered/known method, or 0, if it's not known. On the
server side, the value is a (ChannelRegisteredMethod*). This metadata
information can be used throughout the stack to check whether a call is
registered or not.
* Server Changes -
* When a new transport connection is accepted, the server sets
`registered_method_matcher_cb` along with `accept_stream_fn`. This
function checks whether the method is registered or not and sets the
RegisteredMethod matcher in the metadata for use later.
* Client Changes -
* Set the metadata on call creation on whether the method is registered
or not.