<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
The very non-trivial upgrade of third_party/protobuf to 22.x
This PR strives to be as small as possible and many changes that were
compatible with protobuf 21.x and didn't have to be merged atomically
with the upgrade were already merged.
Due to the complexity of the upgrade, this PR wasn't created
automatically by a tool, but manually. Subsequent upgraded of
third_party/protobuf with our OSS release script should work again once
this change is merged.
This is best reviewed commit-by-commit, I tried to group changes in
logical areas.
Notable changes:
- the upgrade of third_party/protobuf submodule, the bazel protobuf
dependency itself
- upgrade of UPB dependency to 22.x (in the past, we used to always
upgrade upb to "main", but upb now has release branch as well). UPB
needs to be upgraded atomically with protobuf since there's a de-facto
circular dependency (new protobuf depends on new upb, which depends on
new protobuf for codegen).
- some protobuf and upb bazel rules are now aliases, so `
extract_metadata_from_bazel_xml.py` and `gen_upb_api_from_bazel_xml.py`
had to be modified to be able to follow aliases and reach the actual
aliased targets.
- some protobuf public headers were renamed, so especially
`src/compiler` needed to be updated to use the new headers.
- protobuf and upb now both depend on utf8_range project, so since we
bundle upb with grpc in some languages, we now have to bundle utf8_range
as well (hence changes in build for python, PHP, objC, cmake etc).
- protoc now depends on absl and utf8_range (previously protobuf had
absl dependency, but not for the codegen part), so python's
make_grpcio_tools.py required partial rewrite to be able to handle those
dependencies in the grpcio_tools build.
- many updates and fixes required for C++ distribtests (currently they
all pass, but we'll probably need to follow up, make protobuf's and
grpc's handling of dependencies more aligned and revisit the
distribtests)
- bunch of other changes mostly due to overhaul of protobuf's and upb's
internal build layout.
TODOs:
- [DONE] make sure IWYU and clang_tidy_code pass
- create a list of followups (e.g. work to reenable the few tests I had
to disable and to remove workaround I had to use)
- [DONE in cl/523706129] figure out problem(s) with internal import
---------
Co-authored-by: Craig Tiller <ctiller@google.com>
I added the `CallDispatchController` interface back in #26200 in
preparation for supporting the retry circuit breaker in xDS. However, we
subsequently decided not to support that feature, so we no longer need
this interface. And switching back to a simple on-commit callback should
make it easier to implement stateful session affinity for
WeightedClusters.
When using this internally, we noticed that it's impossible to use
custom_metadata.h without creating a dependency cycle between
:custom_metadata and :grpc_base.
A full build refactoring is too large right now, so merge that header
into :grpc_base for the time being.
Also, separate `SimpleSliceBasedMetadata` into its own file, so that it
can be reused in custom_metadata.h.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR also centralizes the client channel resolver selection. Resolver
selection is still done using the plugin system, but when the Ares and
native client channel resolvers go away, we can consider bootstrapping
this differently.
This commit adds support for using CIDR blocks defined in the `no_proxy`
environment variable. For example:
```
http_proxy=http://localhost:8080 no_proxy=10.10.0.0/24
```
The example above would bypass the proxy if the server IP matched
10.10.0.0 - 10.10.0.255.
Closes#22681
---------
Co-authored-by: Yash Tibrewal <yashkt@google.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The logic is straightforward: attempt to read the
`GRPC_ALTS_MAX_CONCURRENT_HANDSHAKES` environment variable and, if it
set to an integer, instantiate the handshake queues based on this
integer.
Based on go/grpc-alts-concurrent-handshake-cap.
Aim here is to allow adding custom metadata types to the internal build.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
(hopefully last try)
Add new channel arg GRPC_ARG_ABSOLUTE_MAX_METADATA_SIZE as hard limit
for metadata. Change GRPC_ARG_MAX_METADATA_SIZE to be a soft limit.
Behavior is as follows:
Hard limit
(1) if hard limit is explicitly set, this will be used.
(2) if hard limit is not explicitly set, maximum of default and soft
limit * 1.25 (if soft limit is set) will be used.
Soft limit
(1) if soft limit is explicitly set, this will be used.
(2) if soft limit is not explicitly set, maximum of default and hard
limit * 0.8 (if hard limit is set) will be used.
Requests between soft and hard limit will be rejected randomly, requests
above hard limit will be rejected.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR adds the view `grpc.io/client/api_latency` for GCP Observability
which aims to collect the end-to-end time taken by a call.
Changes made to support this -
1) A global interceptor factory registration is created for stats
plugins.
2) OpenCensus plugin now provides a new interceptor that's responsible
for collecting the new latency.
3) Gcp Observability registers this plugin.
4) A new OpenCensus measurement and view is created for api latency.
Note that this is internal as of now, since it's not clear if it should
be exposed as public experimental API. Leaving that decision for the
future.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR adds annotations to client attempt spans and server spans on
messages of the form -
* `Send message: 1026 bytes`
* `Send compressed message: 31 bytes` (if message was compressed)
* `Received message: 31 bytes`
* `Received decompressed message: 1026 bytes` (if message needed to be
decompressed)
Note that the compressed and decompressed annotations are not present if
compression/decompression was not performed.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This change mostly aims to get OpenCensus to use the new
ServerCallTracer interface. Note that the interfaces nor the code are in
their final states. There are a bunch of moving pieces, but I thought
this might be a nice mid-step to check-in and make sure that our
internal traces can also work with these changes.
Overall changes -
1) call_tracer.h shows what the hierarchy of new call tracer interfaces
looks like. Open to renaming suggestions.
2) Moved most of the common interface between `CallAttemptTracer` and
`ServerCallTracer` into a common `CallTracerInterface`. We should be
able to eventually move `RecordReceivedTrailingMetadata` and `RecordEnd`
as well to these common interfaces, but it requires some additional
work.
3) The compression filter is now responsible for recording the recv and
send messages for both the subchannel call and the server, and adds in
ability to record compressed and decompressed messages as well.
4) The OpenCensus server filter now uses the new `ServerCallTracer`
interface, and so doesn't need to be a filter anymore.
5) A new ServerCallTracerFilter was added. Ideally, we should be able to
move it to the current connected filter, but it is in a bit of an
interesting state right now, so I would prefer making those changes in a
separate PR with Craig's eyes on it.
6) A new context element `GRPC_CONTEXT_CALL_TRACER_ANNOTATION_INTERFACE`
was created that replaces the old `GRPC_CONTEXT_CALL_TRACER`, and the
new `GRPC_CONTEXT_CALL_TRACER` is mainly to pass the `CallAttemptTracer`
down the stack. This should go away in the new promise-based world.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/grpc/grpc/32618)
<!-- Reviewable:end -->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This prevents deadlock against wire writer issues.
Currently, there are some `transport_stream_receiver_` callbacks
triggered by NDK binder may acquire `WireReaderImpl::mu_` first then
`WireWriterImpl::write_mu_`. We don't like see this.
We have this problem since some client and server are in the same
process. The behavior of NDK binder seems more aggressive when the Tx
and Rx are in the same process.
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Internal Windows builds will catch issues that we cannot yet catch in
OSS. This will be mostly remedied once we have clang-cl in our CI (See a
rough roadmap in https://github.com/grpc/grpc/pull/32448). For now, this
PR identifies folders where most Windows-specific code is developed, and
requires cherrypicks for PRs that touch anything inside those folders.
This PR also refactors gpr and gprpp source files to better isolate all
platform-specific code ~the Windows-only code~. ~I will reorganize the
other platform-specific files using this structure if there are no
objections.~
This subset of folders covers about half of the `#ifdef GPR_WINDOWS`
usages in gRPC, but nearly all of the actively-developed Windows code
locations.
This reverts commit 0fc0384b5a.
Major changes: this code calls `GetDefaultEventEngine` once on Alarm
init instead of 7 times throughout.
I will run benchmarks to ensure b/237283941 is not reproduced.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
This is a prerequisite for converting the client_channel filter to
promises. This refactors two objects:
- `ClientChannel::CallData`, which is primarily responsible for applying
the service config to the call
- `ClientChannel::LoadBalancedCall`, which is responsible for doing the
LB pick for the call attempt
Each of those classes has been split into two pieces:
- a base class with the functionality to be shared between the legacy
filter stack implementation and the new promise-based implementation
- a subclass providing the legacy filter stack implementation
A subsequent PR will add another subclass that provides the
promise-based implementation.
Original attempt was #31973, reverted in #32324 due to test flakiness.
There were two problems causing test flakiness here.
The first problem was that, upon resolver error, we were dispatching an
async callback to re-process each of the queued picks *before* we
updated the channel's connectivity state, which meant that the queued
picks might be re-processed in another thread before the new
connectivity state was set, so tests that expected the state to be
TRANSIENT_FAILURE once RPCs failed might not see the expected state.
The second problem affected the xDS ring hash tests, and it's a bit more
involved to explain.
We have an e2e test that simulates an aggregate cluster failover from a
primary cluster using ring_hash at startup. The primary cluster has two
addresses, both of which are unreachable when the client starts up, so
the client should immediately fail over to the secondary cluster, which
does have reachable endpoints. The test requires that no RPCs are failed
while this failover occurs. The original PR made this test flaky.
The problem here was caused by a combination of two factors:
1. Prior to the original PR, when the picker was updated (which happens
inside the WorkSerializer), we re-processed previously queued picks
synchronously, so it was not possible for another subchannel
connectivity state update (which also happens in the WorkSerializer) to
be processed between the time that we updated the picker and the time
that we re-processed the previously queued picks. The original PR
changed this such that the queued picks are re-processed asynchronously
(outside of the WorkSerializer), so it is now possible for a subchannel
connectivity state update to be processed between when the picker is
updated and when we re-process the previously queued picks.
2. Unlike most LB policies, where the picker does not see updated
subchannel connectivity states until a new picker is created, the
ring_hash picker gets the subchannel connectivity states from the LB
policy via a lock, so it can wind up seeing the new states before it
gets updated. This means that when a subchannel connectivity state
update is processed by the ring_hash policy in the WorkSerializer, it
will immediately be seen by the existing picker, even without a picker
update.
With those two points in mind, the sequence of events in the failing
test were as follows:
1. The pick is attempted in the ring_hash picker for the primary
cluster. This causes the first subchannel to attempt to connect.
2. The subchannel transitions from IDLE to CONNECTING. A new picker is
returned due to the subchannel connectivity state change, and the
channel retries the queued pick. The retried pick is done
asynchronously, but in this case it does not matter: the call will be
re-queued.
3. The connection attempt fails, and the subchannel reports
TRANSIENT_FAILURE. A new picker is again returned, and the channel
retries the queued pick. The retried pick is done asynchronously, but in
this case it does not matter: this causes the picker to trigger a
connection attempt for the second subchannel.
4. The second subchannel transitions from IDLE to CONNECTING. A new
picker is again returned, and the channel retries the queued pick. The
retried pick is done asynchronously, and in this case it *does* matter.
5. The second subchannel now transitions to TRANSIENT_FAILURE. The
ring_hash policy will now report TRANSIENT_FAILURE, but before it can
finish that...
6. ...In another thread, the channel now tries to re-process the queued
pick using the CONNECTING picker from step 4. However, because the
ring_hash policy has already seen the TRANSIENT_FAILURE report from the
second subchannel, that picker will now fail the pick instead of queuing
it.
After discussion with @ejona86 and @dfawley (since this bug actually
exists in Java and Go as well), we agreed that the right solution is to
change the ring_hash picker to contain its own copy of the subchannel
connectivity state information, rather than sharing that information
with the LB policy using synchronization.
Currently, the peer name is returned with the completion of the
send_initial_metadata op, which does not make sense, because with
retries, we don't actually know the peer name until we complete the
recv_initial_metadata op. This PR changes our code to return the peer
string as an attribute of the recv_initial_metadata op, so that it is
not available to the application until that point. This change may be
user-visible, but since our API docs don't seem to guarantee exactly
when this data will be available, it's not technically a breaking
change.
Note that in the promise-based stack, we were already assuming that the
peer string would be returned as part of the recv_initial_metadata
batch, so this PR helps reduce risk for the promise conversion by making
this semantic change now, thus decoupling it from the promise
conversion.
I have also changed the representation of the string in the metadata
batch to be a `grpc_core::Slice` instead of a `std::string`, so that we
can just take a ref to the string held in the transport instead of
having to copy the whole string for every call.