Modeled after mutexes in the Rust ecosystem: the mutex owns the data
provided, and acquisition of the mutex returns a handle with which to
manipulate that data.
This fits in nicely with the execution environment we've established
whereby we may want to pass the lock from lambda to lambda for some
time.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Add a config to experiments & rollouts to allow dependent experiments to
be flagged.
We're getting past the point where it's possible to reason about which
experiments need to be turned off if we disable some other experiment,
and so this provides some additional rollout safety.
Can be specified in both experiments and rollouts: experiments.yaml
makes the most sense and we should default to it, but rollouts.yaml lets
us put dependencies between internal & external dependencies internally
and that's gonna be a little useful.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is a follow-up PR of #34191, which handles the error condition of
endpoints failed to write/read in chaotic-good client transport.
This PR needs to be merged after #34191.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Bradley Hess <bdhess@google.com>
Co-authored-by: AJ Heller <hork@google.com>
- Fix deadlock in load reporting tests.
- Add timeout to `WaitForLoadReport()`. (Note: this required changing
from `grpc::internal::Mutex` and friends to `grpc_core::Mutex` and
friends.)
- Fix balancer stream shutdown machinery.
- Change `ServerThread` to be a class instead of a struct.
Changes to fake resolver:
- Add `WaitForReresolutionRequest()` method to fake resolver response
generator to allow tests to tell when re-resolution has been requested.
- Change fake resolver response generator API to have only one mechanism
for injecting results, regardless of whether the result is an error or
whether it's triggered by a re-resolution.
Changes to grpclb_end2end_test:
- Change balancer interface such that instead of setting a list of
responses with fixed delays, the test can control exactly when each
response is set.
- Change balancer impl to always send the initial LB response, as
expected by the grpclb protocol.
- Change balancer impl to always read load reports, even if load
reporting is not expected to be enabled. (The latter case will still
cause the test to fail.) Reads are done in a different thread than
writes.
- Allow each test to directly control how many backends and balancers
are started and the client load reporting interval, so that (a) we don't
waste resources starting servers we don't need and (b) there is no need
to arbitrarily split tests across different test classes.
- Add timeouts to `WaitForAllBackends()` functionality, so that tests
will fail with a useful error rather than timing out.
- Improved ergonomics of various helper functions in the test framework.
In the process of making these changes, I found a couple of bugs:
- A bug in pick_first, which I fixed in #34885.
- A bug in grpclb, in which we were using the wrong condition to decide
whether to propagate a re-resolution request from the child policy,
which I've fixed in this PR. (This bug probably originated way back in
#18344.)
This should address a lot of the flakes seen in grpclb_e2e_test
recently.
Roll forward #34657, which was reverted in #34761.
Previous error in CMake:
```
[ RUN ] ClientTransportTest.AddOneStreamMultipleMessages
unknown file: Failure
Unexpected mock function call - returning directly.
Function call: Call(CANCELLED: )
Google Mock tried the following 1 expectation, but it didn't match:
/[var/local/git/grpc/test/core/transport/chaotic_good/client_transport_test.cc:484](https://cs.corp.google.com/piper///depot/google3/var/local/git/grpc/test/core/transport/chaotic_good/client_transport_test.cc?l=484): EXPECT_CALL(on_done, Call(absl::OkStatus()))...
Expected arg #0: is equal to OK
Actual: CANCELLED:
Expected: to be called once
Actual: never called - unsatisfied and active
/[var/local/git/grpc/test/core/transport/chaotic_good/client_transport_test.cc:484](https://cs.corp.google.com/piper///depot/google3/var/local/git/grpc/test/core/transport/chaotic_good/client_transport_test.cc?l=484): Failure
Actual function call count doesn't match EXPECT_CALL(on_done, Call(absl::OkStatus()))...
Expected: to be called once
Actual: never called - unsatisfied and active
real 0.24
user 0.00
sys 0.00
2023-10-20 01:50:32,776 FAILED: cmake/build/client_transport_test --gtest_filter=ClientTransportTest.AddOneStreamMultipleMessages GRPC_POLL_STRATEGY=epoll1 [ret=139, pid=1663532, time=0.3sec]
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This fixes a bug accidentally introduced in #33753. The symptom is that
if we exit idle and then get a new address list before any of the
subchannels in the old list can report their initial connectivity state,
we will incorrectly ignore the new address list.
EventEngine experiments, especially with `work_serializer_dispatch` tend
to cause callbacks to occur later than we've previously seen, so tests
that verify global data structures tend to become flakier when these are
introduced.
Here, the fix is waiting for EventEngine to be closed before starting
the new test.
Whilst here, make some adjustments to the test for better readability on
what's going on:
- if we fail a request to an echo service, we do not actually expect the
messages to match, so don't report that
- if we expect a value of 1 or 2, AnyOf is a better tool: it will report
the actual value too
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This adds the directory reloader implementation of the CrlProvider. This
will periodically reload CRL files in a directory per [gRFC
A69](https://github.com/grpc/proposal/pull/382)
Included in this is the following:
* A public API to create the `DirectoryReloaderCrlProvider`
* A basic directory interface in gprpp and platform specific impls for
getting the list of files in a directory (unfortunately prior C++17,
there is no std::filesystem, so we have to have platform specific impls)
* The implementation of `DirectoryReloaderCrlProvider` takes an
event_engine and a directory interface. This allows us to test using the
fuzzing event engine for time mocking, and to implement a test directory
interface so we avoid having to make temporary directories and files in
the tests. This is notably not in `include`, and the
`CreateDirectoryReloaderCrlProvider` is the only way to construct one
from the public API, so we don't expose the event engine and directory
details to the user.
---------
Co-authored-by: gtcooke94 <gtcooke94@users.noreply.github.com>
This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.
Address https://github.com/grpc/grpc/issues/12554
The API for `duplicate_without_call_credentials` says
```
// Creates a version of the channel credentials without any attached call
// credentials. This can be used in order to open a channel to a non-trusted
// gRPC load balancer.
```
As the impl stands right now, because of that description, in the case
of layered composite creds, I think the right behavior would be to call
down until you get the base cred with no call cred.
In discussing with the team, we do wonder if the use-case of layered
composite creds is really something that should be a feature, or if we
should be checking during the creation of composite creds to make sure
we aren't layering composite creds? @markdroth can you give your
thoughts?
Built upon @Vignesh2208 's work in #33156
This adds ref counting to the http_proxy fixture object, fixing test
flakes identified by the introduction of EventEngine listeners. Proxy
objects were either being deleted twice, or sometimes not at all,
resulting in two different sorts of flakes.
Add the flag `enable_csm_observability` to the c++ PSM interop testing
image, such that when enabled from the PSM interop testing framework,
the C++ client/server app will enable the CSM Observability plugin.
Relands #34785, which was reverted in #34818.
The first commit is the revert. The second commit removes the gtest
dependency from the xds_server library, which should address the
testonly problem internally.
As such, `alts_zero_copy_grpc_protector_create` will take a
`GsecKeyFactoryInterface` to create `GsecKeyInterface` objects for the
underlying crypter to use.
This enables the caller to control how all the key related buffers are
prepared and protected.
`gsec_aes_gcm_aead_crypter` holds the raw pointer to `GsecKeyInterface`
instead of a `unique_ptr` possibly because somewhere in the test (and
maybe production code as well), the structure is getting copied. A SEGV
error would be caused with `unique_ptr` which doesn't support copy
operations.
Fixes two issue with the `event_engine_listener` experiment:
* a race in `on_connect` with the read/write path
* a long delay due to pollset_work being effectively a `sleep` in the
EventEngine shim implementation
Currently it is very easy to use the `TlsCredentialsOptions` in such a
way that it produces a memory leak. For example, the code block
```
{
TlsCredentialsOptions options;
}
```
produces a memory leak. This PR fixes up the ownership bugs in this
class and its `grpc_tls_credentials_options`, the C-core analogue.
The original logic from #34615 was incorrect in cases where one address
family has a different number of addresses than the other(s).
Fixes b/307937051.
Also break the filter stack and promise based versions apart so that I
can re-understand this code.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This fixes a bug which could happen if multiple WorkStealingThreadPools
existed (wstpA and wstpB), and a thread in wstpA called
`wstpB->Run(closure)`. Previously, this would have scheduled the closure
on the current wstpA thread worker's local queue.
`bm_thread_pool` results look unchanged on RBE.
Earlier, the grpc message-length prefix for outgoing data messages was
incorrectly being counted towards `data_bytes` instead of
`framing_bytes`. This PR fixes it.
Note that the incoming stats collection properly attributes the grpc
message-length prefix to `framing_bytes`.
This change will affect all stats plugins (OpenCensus and OpenTelemetry)
that make use of this information for metrics.