This version broke backward compatibility in `plugin_pb2.py`, which is
presumably a relatively minor regression, since we have not yet heard
any complaints about it. This PR:
- Excludes `4.22.0` from installation
- _Includes_ protobuf pre-releases into testing so this can be caught
more quickly in the future.
When bad prereleases are caught, we can exclude them from testing in a
similar manner to this PR. We may eventually want to invest into a
system where we can define these bad versions centrally.
Relands #32385 (reverted in #32419) with fixes.
The Windows build is clean on a test cherrypick: cl/511291828
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
This is a step towards enabling `--define=use_strict_warning=true` for
Windows clang-cl.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Return `Timeout(kMaxHours, Unit::kHours)` if the value is about to
overflow in `DivideRoundingUp`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This code is not plumbed through yet, but it provides the core
infrastructure needed to detect the proper GCP environment resources
needed to set up the labels/attributes/resources for stats, tracing and
logging.
Details on how the various environment resources are setup has been
derived by looking at java's cloud logging library and OpenTelemetry's
future plans. (Could be better explained in an offline review since some
links are internal).
Requesting @veblush for a full review and @markdroth for a structural
review.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This is a prerequisite for converting the client_channel filter to
promises. This refactors two objects:
- `ClientChannel::CallData`, which is primarily responsible for applying
the service config to the call
- `ClientChannel::LoadBalancedCall`, which is responsible for doing the
LB pick for the call attempt
Each of those classes has been split into two pieces:
- a base class with the functionality to be shared between the legacy
filter stack implementation and the new promise-based implementation
- a subclass providing the legacy filter stack implementation
A subsequent PR will add another subclass that provides the
promise-based implementation.
The upb team wants to remove this particular bit of syntactic sugar from
the generated code. So instead of calling has_foo() when foo is a map
field, we call foo_size() and test the result against zero.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Original attempt was #31973, reverted in #32324 due to test flakiness.
There were two problems causing test flakiness here.
The first problem was that, upon resolver error, we were dispatching an
async callback to re-process each of the queued picks *before* we
updated the channel's connectivity state, which meant that the queued
picks might be re-processed in another thread before the new
connectivity state was set, so tests that expected the state to be
TRANSIENT_FAILURE once RPCs failed might not see the expected state.
The second problem affected the xDS ring hash tests, and it's a bit more
involved to explain.
We have an e2e test that simulates an aggregate cluster failover from a
primary cluster using ring_hash at startup. The primary cluster has two
addresses, both of which are unreachable when the client starts up, so
the client should immediately fail over to the secondary cluster, which
does have reachable endpoints. The test requires that no RPCs are failed
while this failover occurs. The original PR made this test flaky.
The problem here was caused by a combination of two factors:
1. Prior to the original PR, when the picker was updated (which happens
inside the WorkSerializer), we re-processed previously queued picks
synchronously, so it was not possible for another subchannel
connectivity state update (which also happens in the WorkSerializer) to
be processed between the time that we updated the picker and the time
that we re-processed the previously queued picks. The original PR
changed this such that the queued picks are re-processed asynchronously
(outside of the WorkSerializer), so it is now possible for a subchannel
connectivity state update to be processed between when the picker is
updated and when we re-process the previously queued picks.
2. Unlike most LB policies, where the picker does not see updated
subchannel connectivity states until a new picker is created, the
ring_hash picker gets the subchannel connectivity states from the LB
policy via a lock, so it can wind up seeing the new states before it
gets updated. This means that when a subchannel connectivity state
update is processed by the ring_hash policy in the WorkSerializer, it
will immediately be seen by the existing picker, even without a picker
update.
With those two points in mind, the sequence of events in the failing
test were as follows:
1. The pick is attempted in the ring_hash picker for the primary
cluster. This causes the first subchannel to attempt to connect.
2. The subchannel transitions from IDLE to CONNECTING. A new picker is
returned due to the subchannel connectivity state change, and the
channel retries the queued pick. The retried pick is done
asynchronously, but in this case it does not matter: the call will be
re-queued.
3. The connection attempt fails, and the subchannel reports
TRANSIENT_FAILURE. A new picker is again returned, and the channel
retries the queued pick. The retried pick is done asynchronously, but in
this case it does not matter: this causes the picker to trigger a
connection attempt for the second subchannel.
4. The second subchannel transitions from IDLE to CONNECTING. A new
picker is again returned, and the channel retries the queued pick. The
retried pick is done asynchronously, and in this case it *does* matter.
5. The second subchannel now transitions to TRANSIENT_FAILURE. The
ring_hash policy will now report TRANSIENT_FAILURE, but before it can
finish that...
6. ...In another thread, the channel now tries to re-process the queued
pick using the CONNECTING picker from step 4. However, because the
ring_hash policy has already seen the TRANSIENT_FAILURE report from the
second subchannel, that picker will now fail the pick instead of queuing
it.
After discussion with @ejona86 and @dfawley (since this bug actually
exists in Java and Go as well), we agreed that the right solution is to
change the ring_hash picker to contain its own copy of the subchannel
connectivity state information, rather than sharing that information
with the LB policy using synchronization.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This applies to all wrapped languages.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Removing a number of unused variables. This has no behaviour change.
These types are not considered "unused variables" by normal
`-Wunused-variable` flags because they have nontrivial destructors, but
these types' destructors are not used for their side effects, so unused
variables of these types should be considered bug-prone.
This PR removes all unused `absl::Status` and `absl::StatusOr<>`
variables I could find in grpc.
Currently, the peer name is returned with the completion of the
send_initial_metadata op, which does not make sense, because with
retries, we don't actually know the peer name until we complete the
recv_initial_metadata op. This PR changes our code to return the peer
string as an attribute of the recv_initial_metadata op, so that it is
not available to the application until that point. This change may be
user-visible, but since our API docs don't seem to guarantee exactly
when this data will be available, it's not technically a breaking
change.
Note that in the promise-based stack, we were already assuming that the
peer string would be returned as part of the recv_initial_metadata
batch, so this PR helps reduce risk for the promise conversion by making
this semantic change now, thus decoupling it from the promise
conversion.
I have also changed the representation of the string in the metadata
batch to be a `grpc_core::Slice` instead of a `std::string`, so that we
can just take a ref to the string held in the transport instead of
having to copy the whole string for every call.
A handful of problems were identified while writing the
WindowsEventEngine Listener. To make the listener review easier, these
fixes can be landed separately.
This is built upon https://github.com/grpc/grpc/pull/32376
Problems that are fixed in this PR:
* `OnConnectCompleted` held a Mutex while calling the user callback,
which can deadlock.
* The WinSocket and some associated data needs to remain alive after the
Endpoint destroyed, since Windows IOCP still needs to use some of that
data. Endpoint destruction and socket shutdown are now decoupled, with
the socket managed by a shared_ptr.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
There were some rollback conflicts, so this isn't a pure rollback.
This reverts commit ba0e55f539.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Let's see if this fixes the "Bus error" flakes that have been happening
in CI.
If it does, then we can narrow things down a bit. If flakes continue,
then we can revert this PR.
The set of trace flags is now:
* event_engine
* event_engine_endpoint
* event_engine_endpoint_data: additionally log all sent/received data,
similar to what the shims do.
* event_engine_poller
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Cleanup and remove ios cpp test cronet
To test manually:
./tools/bazel test //src/objective-c/tests:CppCronetTests
@sampajano
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The shims assumed that all platforms with Posix socket support would use
the PosixEventEngine. This is not the case for iOS.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Based on a discussion with @yashykt , I outlined the intended use of
`GetDefaultEventEngine`, with a bit of explanation around why we chose
to make it work the way it does.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This test has a race and is flaky, see
[b/268379869](https://b.corp.google.com/issues/268379869). It sends an
RPC to the interop server with 0s `keepaliveTimeout` and expects the
keepalive watchdog timer to fire immediately before the server acks the
ping.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Rollforward #32346 with some fixes in
1e88193edd
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This is a prerequisite for upcoming Abseil change using a monotonic
clock for `WaitWithTimeout`. This change allows gRPC gpr_cv_wait to use
`WaitWithTimeout` when it's given monotonic clock or timespan.
Caveat: This won't change the actual gRPC behavior until Abseil gets new
change regarding this.
Fixes the banned function checker to include header files. Some things
had slipped through, such as `absl::make_unique`
033d55ffd3/tools/run_tests/sanity/core_banned_functions.py (L64-L65)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
To be merged after #31448#32110#32094
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The pooled allocator currently has an ABA issue in the allocation path.
This change should fix that - algorithm is described reasonably well in
the PR.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Reverts grpc/grpc#32240
Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run:
https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets
Seems like the `testKeepaliveWithV2API` test case failed:
```
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds).
......
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds).
```
The previous implementation assumed that shutdown and fork events could
not occur at the same time, but that's not the case. This change adds
separate tracking for fork and shutdown bits.
cc @gnossen
* initial commit, bdp timer
* migrate keepalive_ping_timer and keepalive_watchdog_timer
* still has some issue with refcount
* fix refcount
with the help from refcount_solver.py :^)
* migrate delayed_ping_timer
* add some GPR_ASSERTs
* comment
* review
* try mutex/lock
* fix build deps
* use GRPC_UNUSED
* review
* WIP
* use the same lock for queue and picker -- still need to figure out how to drain queue safely
* working!
* always unref pickers in WorkSerializer
* simplify tracking of queued calls
* fix sanity
* simplify resolver queueing code
* fix tsan failure
* simplify queued LB pick reprocessing
* minor cleanups
* remove unnecessary wrap-around checks
* clang-format
* generate_projects
* fix pollset bug
* use absl::flat_hash_set<> instead of std::set<>
* fix use-after-free in retry code
* add missing BUILD dep
* Add info about ca cert used to verify chain.
The tsi_peer object will now contain the subject of the root/ca cert
that was used to verify the peer's chain during a handshake.
* temp investigation
* Fix issues relating to overlapping CRL callback
* formatting on ssl_transport_security.cc
* Swap ca_cert naming
* Use preverify_ok instead of numbers
* Continue some renaming, addressing pr comments
* Removed early return if peer property setting fails
* Continue renaming
* clang-tidy
* Fix clang problem
* clang fixes
* Add null check in tests
* More PR changes. Behavior change to include root cert extract when TSI_REQUEST_CLIENT_CERTIFICATE_AND_VERIFY
* Add intermediate ca, leaf cert, and test with them
* clang-tidy
* Basic formatting
* Add new keys to build for export
* Add new cert files to test BUILD
* build file style fix
* changes for chain test
* clang-format
* build clean
* Add $ to lines of code in README
* Add directive about X509_STORE_CTX_get0_chain
* formatting