Design is documented at
[go/windows-dns-resolver-issue](http://go/windows-dns-resolver-issue)
(note that the design doc is slightly outdated regarding the shared
ownership model of the virtual socket that was implemented in
13bd2b404e).
Passed `//test/cpp/naming:resolver_component_tests_runner_invoker` and
`//test/cpp/naming:cancel_ares_query_test`:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (1 packages loaded, 8 targets configured).
INFO: Found 1 test target...
INFO: From Compiling src/core/lib/event_engine/windows/windows_engine.cc:
C:\bazel6\execroot\com_github_grpc_grpc\src/core/lib/channel/channel_args.h(287): warning C4312: 'reinterpret_cast': conversion from 'int' to 'void *' of greater size
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 230.374s, Critical Path: 228.54s
INFO: 9 processes: 2 internal, 7 local.
INFO: Build completed successfully, 9 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 221.2s
Executed 1 out of 1 test: 1 test passes.
```
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:cancel_ares_query_test
INFO: Analyzed target //test/cpp/naming:cancel_ares_query_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:cancel_ares_query_test up-to-date:
bazel-bin/test/cpp/naming/cancel_ares_query_test.exe
INFO: Elapsed time: 49.656s, Critical Path: 48.00s
INFO: 6 processes: 2 internal, 4 local.
INFO: Build completed successfully, 6 total actions
//test/cpp/naming:cancel_ares_query_test PASSED in 43.0s
Executed 1 out of 1 test: 1 test passes.
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Bradley Hess <bdhess@google.com>
Co-authored-by: AJ Heller <hork@google.com>
- Fix deadlock in load reporting tests.
- Add timeout to `WaitForLoadReport()`. (Note: this required changing
from `grpc::internal::Mutex` and friends to `grpc_core::Mutex` and
friends.)
- Fix balancer stream shutdown machinery.
- Change `ServerThread` to be a class instead of a struct.
Changes to fake resolver:
- Add `WaitForReresolutionRequest()` method to fake resolver response
generator to allow tests to tell when re-resolution has been requested.
- Change fake resolver response generator API to have only one mechanism
for injecting results, regardless of whether the result is an error or
whether it's triggered by a re-resolution.
Changes to grpclb_end2end_test:
- Change balancer interface such that instead of setting a list of
responses with fixed delays, the test can control exactly when each
response is set.
- Change balancer impl to always send the initial LB response, as
expected by the grpclb protocol.
- Change balancer impl to always read load reports, even if load
reporting is not expected to be enabled. (The latter case will still
cause the test to fail.) Reads are done in a different thread than
writes.
- Allow each test to directly control how many backends and balancers
are started and the client load reporting interval, so that (a) we don't
waste resources starting servers we don't need and (b) there is no need
to arbitrarily split tests across different test classes.
- Add timeouts to `WaitForAllBackends()` functionality, so that tests
will fail with a useful error rather than timing out.
- Improved ergonomics of various helper functions in the test framework.
In the process of making these changes, I found a couple of bugs:
- A bug in pick_first, which I fixed in #34885.
- A bug in grpclb, in which we were using the wrong condition to decide
whether to propagate a re-resolution request from the child policy,
which I've fixed in this PR. (This bug probably originated way back in
#18344.)
This should address a lot of the flakes seen in grpclb_e2e_test
recently.
EventEngine experiments, especially with `work_serializer_dispatch` tend
to cause callbacks to occur later than we've previously seen, so tests
that verify global data structures tend to become flakier when these are
introduced.
Here, the fix is waiting for EventEngine to be closed before starting
the new test.
Whilst here, make some adjustments to the test for better readability on
what's going on:
- if we fail a request to an echo service, we do not actually expect the
messages to match, so don't report that
- if we expect a value of 1 or 2, AnyOf is a better tool: it will report
the actual value too
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.
Add the flag `enable_csm_observability` to the c++ PSM interop testing
image, such that when enabled from the PSM interop testing framework,
the C++ client/server app will enable the CSM Observability plugin.
Relands #34785, which was reverted in #34818.
The first commit is the revert. The second commit removes the gtest
dependency from the xds_server library, which should address the
testonly problem internally.
Currently it is very easy to use the `TlsCredentialsOptions` in such a
way that it produces a memory leak. For example, the code block
```
{
TlsCredentialsOptions options;
}
```
produces a memory leak. This PR fixes up the ownership bugs in this
class and its `grpc_tls_credentials_options`, the C-core analogue.
Earlier, the grpc message-length prefix for outgoing data messages was
incorrectly being counted towards `data_bytes` instead of
`framing_bytes`. This PR fixes it.
Note that the incoming stats collection properly attributes the grpc
message-length prefix to `framing_bytes`.
This change will affect all stats plugins (OpenCensus and OpenTelemetry)
that make use of this information for metrics.
Add TcpTracer interface for TCP instrumentation. It takes no gRPC
dependencies for use in external TCP implementations. Also add
HttpAnnotation for HTTP transport instrumentation using CallTracer.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This reverts commit 601aaf80b7, which
results in rolling forward #34408.
The 601aaf reversion happened because of a deadlock found in Python. The
root cause ended up being an issue with the Python wrapper and was fixed
in #34712 , so this can be rolled forward again
Ditch the old priority scheme for ordering filters, instead explicitly
mark up before/after constraints.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The basic APIs for the CRL Reloading features.
This adds external types to represent CRL Providers, CRLs, and
CertificateInfo.
Internally we will use `CrlImpl` - this layer is needed to hide OpenSSL
details from the user.
GRFC - https://github.com/grpc/proposal/pull/382
Things Done
* Add external API for `CrlProvider`, `Crl`, `CertInfo` (`CertInfo` is
used during CRL lookup rather than passing the entire certificate).
* Add code paths in `ssl_transport_security` to utilize CRL providers
* Add `StaticCrlProvider`
* Refactor `crl_ssl_transport_security_test.cc` so it is more extensible
and can be used with providers
@stanley-cheung noticed a bug where CSM labels were not being added on
metrics if the peer was not also CSM Observability enabled.
This PR fixes the behavior to add in the local labels in this case, as
well as add the remote workload type label with the value of unknown.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This behavior is dangerous because we will crash when the cache is
created, which is not necessarily on application startup and is likely
when you first try to establish an SSL connection. Instead, we log an
error. If the SSL library attempts to put a session ticket in the cache
it will fail to do so, but everything else will continue as normal. In
particular, we will always seamlessly fall back to a full SSL handshake.
Along the way, we also ensure that you cannot put a null `SSL_SESSION`
into the cache, which would lead to a segfault when it is fetched from
the cache.
Fix chttp2 too_many_pings test to use only one of IPv4 or IPv6,
depending on test environment.
Also fix dumb reversed conditional bug in some other tests that was
accidentally introduced in #34426.
More changes as part of the dualstack design:
- Change resolver and LB policy APIs to support multiple addresses per
endpoint. Specifically, replace `ServerAddress` with
`EndpointAddresses`, which encodes more than one address. Per-address
channel args are retained at the same level, so they are now
per-endpoint. For now, `EndpointAddress` provides a single-address ctor
and a single-address accessor for backward compatibility, so
`ServerAdress` is an alias for `EndpointAddresses`; eventually, this
alias and the single-address methods will be removed.
- Add an `EndpointAddressSet` class, which represents an unordered set
of addresses to be used as a map key. This will be used in a number of
LB policies that need to store per-endpoint state.
- Change the LB policy API's `ChannelControlHelper::CreateSubchannel()`
method to take the address and per-endpoint channel args as separate
parameters, so that we don't need to construct a legacy `ServerAddress`
object as we create a new subchannel for each address in the endpoint.
- Change pick_first to flatten the address list.
- Change ring_hash to use `EndpointAddressSet` as the key for its
endpoint map, and to use the first address of the endpoint as the hash
key.
- Change WRR to use `EndpointAddressSet` as the key for its endpoint
weight map.
Note that support for multiple addresses per endpoint is guarded in RR
by the existing `round_robin_delegate_to_pick_fist` experiment and in
WRR by the existing `wrr_delegate_to_pick_first` experiment.
This PR does *not* include support for multiple addresses per endpoint
for the outlier_detection or xds_override_host LB policies; those will
come in subsequent PRs.
Expand our fuzzing capabilities by allowing fuzzers to choose the bits
that go into random number distribution generators.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Summary -
On the server-side, we are changing the point at which we decide whether
a method is registered or not from the surface to the transport at the
point where we are done receiving initial metadata and before we invoke
the recv_initial_metadata_ready closures from the filters. The main
motivation for this is to allow filters to check whether the incoming
method is a registered or not. The exact use-case is for observability
where we only want to record the method if it is registered. We store
the information about the registered method in the initial metadata.
On the client-side, we also set information about whether the method is
registered or not in the outgoing initial metadata.
Since we are effectively changing the lookup point of the registered
method, there are slight concerns of this being a potentially breaking
change, so we are guarding this with an experiment to be safe.
Changes -
* Transport API changes -
* Along with `accept_stream_fn`, a new callback
`registered_method_matcher_cb` will be sent down as a transport op on
the server side. When initial metadata is received on the server side,
this callback is invoked. This happens before invoking the
`recv_initial_metadata_ready` closure.
* Metadata changes -
* We add a new non-serializable metadata trait `GrpcRegisteredMethod()`.
On the client-side, the value is a uintptr_t with a value of 1 if the
call has a registered/known method, or 0, if it's not known. On the
server side, the value is a (ChannelRegisteredMethod*). This metadata
information can be used throughout the stack to check whether a call is
registered or not.
* Server Changes -
* When a new transport connection is accepted, the server sets
`registered_method_matcher_cb` along with `accept_stream_fn`. This
function checks whether the method is registered or not and sets the
RegisteredMethod matcher in the metadata for use later.
* Client Changes -
* Set the metadata on call creation on whether the method is registered
or not.
Lets us sever the dependency between stats & exec ctx (finally).
More work likely needs to go into the *mechanism* used here (I'm not a
fan of the per thread index), but that's also something we can address
later.