As discussed, this change adds scoping to `CsmObservability` such that when that object goes out of scope, new channels and servers don't record metrics. In the documentation, I've talked about how existing channels/servers are going to continue to record metrics but i've left room for us to change that behavior in the future.
The current way of doing this is through a global bool since there can only be one plugin right now, but we'll change this to use the global stats plugin registry in the future.
Closes#35835
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35835 from yashykt:DisableCsmObsOnScope 33a7c2f7bc
PiperOrigin-RevId: 605468117
This new directory combines code from the following locations:
- src/core/ext/filters/client_channel/resolver
- src/core/lib/resolver
Closes#35804
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35804 from markdroth:client_channel_resolver_reorg2 30660e6b00
PiperOrigin-RevId: 604665835
This new directory combines code from the following locations:
- src/core/ext/filters/client_channel/lb_policy
- src/core/lib/load_balancing
Closes#35786
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35786 from markdroth:client_channel_resolver_reorg 98554efb98
PiperOrigin-RevId: 604351832
Changes -
* `CsmObservability` API will now use the `CsmOpenTelemetryPluginOption` internally. After this change, `CsmObservability` will enable observability for all channels and servers. (Earlier, `CsmObservability` only enabled observability for CSM-enabled channels and servers.) CSM labels will still be added just for CSM-enabled channels and servers.
* Also, we no longer need the ability to set `LabelInjector` on the `OpenTelemetryPluginBuilder` directly. Instead, we always use `PluginOption` to inject the `LabelInjector`. This simplifies the code as well.
Note that `SetTargetSelector` and `SetServerSelector` APIs on the `OpenTelemetryPluginBuilderImpl` are not being deleted yet since we might need them shortly. This is also why `OpenTelemetryPluginBuilderImpl` is not being deleted right now.
Closes#35803
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35803 from yashykt:CsmO11yApisUsePluginOption cf3d65900d
PiperOrigin-RevId: 604323898
When testing CSM Observability, we discovered that the c++ xds interop client is not sending any payload with the `UnaryCall` RPCs so most of the metrics will have a value of 0.
Adding a payload to the xds interop client here.
We need this fix so that we can verify that the metrics are recording the right number of bytes being sent / received. So we need a non-trivial payload to be sent with the `UnaryCall` RPC between the xds interop client and server.
Closes#35545
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35545 from stanley-cheung:xds-client-payload 11be4e6f4f
PiperOrigin-RevId: 601596246
This reverts commit 6318e9e7e9.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35667
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35667 from ctiller:a 032999b51e
PiperOrigin-RevId: 601495207
Just to be future-proof, I'm amending the `void` return status of `BuildAndRegisterGlobal` in `OpenTelemetryPluginBuilder` to absl::Status.
This will be backported to 1.61 as well.
Closes#35659
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35659 from yashykt:UpdateOtelApiToAddStatus 07d3f41b8a
PiperOrigin-RevId: 601458408
We are no longer sure about this API, so re-experimentalizing it.
This PR will be backported to 1.61 as well.
Closes#35660
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35660 from yashykt:ReexperimentalizeCsmPluginOption 4f114a54d9
PiperOrigin-RevId: 601378856
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35573
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35573 from yijiem:enable-oss-ee-dns-posix-real 017b99312f
PiperOrigin-RevId: 601245249
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35633
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35633 from yijiem:labels-injector-patch 6876243943
PiperOrigin-RevId: 600931754
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35499
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35499 from dawidcha:cred_opts_copy_constr 330165930f
PiperOrigin-RevId: 599977221
Add support for GCE resources in CSM Observability.
Additionally, fix a bug where we were not adding the remote workload's canonical service label for unknown resource types.
Also, if zone and region are both specified, zone takes precedence.
Closes#35371
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35371 from yashykt:GceSupportToCsm e3064d8c3c
PiperOrigin-RevId: 597989825
It looks like this ended up getting deleted in https://github.com/grpc/grpc/pull/34350 probably when merging.
Also, the `Init` method in the otel test library is getting unwieldy. I'm going to send out a follow-up PR to convert this into a builder instead.
Closes#35532
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35532 from yashykt:OTelPluginBuilderFix 372bf26338
PiperOrigin-RevId: 597846622
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35210
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35210 from yijiem:csm-service-label 6a6a7d1774
PiperOrigin-RevId: 597641393
Make sure there is no unnecessary delays when there are multiple reports in the queue.
This change also adds a test for the custom LB policy.
Closes#35467
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35467 from eugeneo:tasks/orca-test-timeout-316026521 4aab50a118
PiperOrigin-RevId: 597007131
It's not clear to me that this one unit test of very marginal importance warrants 8 bytes per channel.
Closes#35465
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35465 from ctiller:we-dont-need-this-really e7ee62ccb2
PiperOrigin-RevId: 596091614
There are a select few tests that are failing when building with OpenSSL102 - disable them until we can fix.
Closes#35354
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35354 from gtcooke94:fix_ossl_102 8708d6ce86
PiperOrigin-RevId: 595761932
Provide a public experimental API and bazel compatible build target for OpenTelemetry metrics.
Details -
* New `OpenTelemetryPluginBuilder` class that provides the API specified in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md
* The existing `grpc::internal::OpenTelemetryPluginBuilder` class is moved to `grpc::internal::OpenTelemetryPluginBuilderImpl` for disambiguation.
* Renamed `OTel` in some instances to `OpenTelemetry` for consistency.
Closes#35348
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35348 from yashykt:OTelPublicApi e32328825e
PiperOrigin-RevId: 594271246
This is a prerequisite change to start supporting Bazel 7. Changes are
- Disabled bzlmod which Bazel 7 begins to enable by default. This eventually needs to be done to support bzlmod but not now.
- Upgraded some bazel rule dependencies which are required to support Bazel 7.
- Using Python 3 explcitly as Bazel 7 begins to reject Python 2.
Note that this isn't enough to enable Bazel 7 by default and another PR will follow for that.
Closes#35374
PiperOrigin-RevId: 592931675
Previously, `RefCountedPtr<>` and `WeakRefCountedPtr<>` incorrectly allowed
implicit casting of any type to any other type. This hadn't caused a
problem until recently, but now that it has, we need to fix it. I have
fixed this by changing these smart pointer types to allow type
conversions only when the type used is convertible to the type of the
smart pointer. This means that if `Subclass` inherits from `Base`, then
we can set a `RefCountedPtr<BaseClass>` to a value of type
`RefCountedPtr<Subclass>`, but we cannot do the reverse.
We had been (ab)using this bug to make it more convenient to deal with
down-casting in subclasses of ref-counted types. For example, because
`Resolver` inherits from `InternallyRefCounted<Resolver>`, calling
`Ref()` on a subclass of `Resolver` will return `RefCountedPtr<Resolver>`
rather than returning the subclass's type. The ability to implicitly
convert to the subclass type made this a bit easier to deal with. Now
that that ability is gone, we need a different way of dealing with that
problem.
I considered several ways of dealing with this, but none of them are
quite as ergonomic as I would ideally like. For now, I've settled on
requiring callers to explicitly down-cast as needed, although I have
provided some utility functions to make this slightly easier:
- `RefCounted<>`, `InternallyRefCounted<>`, and `DualRefCounted<>` all
provide a templated `RefAsSubclass<>()` method that will return a new
ref as a subclass. The type used with `RefAsSubclass()` must be a
subclass of the type passed to `RefCounted<>`, `InternallyRefCounted<>`,
or `DualRefCounted<>`.
- In addition, `DualRefCounted<>` provides a templated `WeakRefAsSubclass<T>()`
method. This is the same as `RefAsSubclass()`, except that it returns
a weak ref instead of a strong ref.
- In `RefCountedPtr<>`, I have added a new `Ref()` method that takes
debug tracing parameters. This can be used instead of calling `Ref()`
on the underlying object in cases where the caller already has a
`RefCountedPtr<>` and is calling `Ref()` only to specify the debug
tracing parameters. Using this method on `RefCountedPtr<>` is more
ergonomic, because the smart pointer is already using the right
subclass, so no down-casting is needed.
- In `WeakRefCountedPtr<>`, I have added a new `WeakRef()` method that
takes debug tracing parameters. This is the same as the new `Ref()`
method on `RefCountedPtr<>`.
- In both `RefCountedPtr<>` and `WeakRefCountedPtr<>`, I have added a
templated `TakeAsSubclass<>()` method that takes the ref out of the
smart pointer and returns a new smart pointer of the down-casted type.
Just as with the `RefAsSubclass()` method above, the type used with
`TakeAsSubclass()` must be a subclass of the type passed to
`RefCountedPtr<>` or `WeakRefCountedPtr<>`.
Note that I have *not* provided an `AsSubclass<>()` variant of the
`RefIfNonZero()` methods. Those methods are used relatively rarely, so
it's not as important for them to be quite so ergonomic. Callers of
these methods that need to down-cast can use
`RefIfNonZero().TakeAsSubclass<>()`.
PiperOrigin-RevId: 592327447
The `DirectoryReloaderProvider` currently segfaults on construction if grpc_init() is not called before construction. This is because when creating the `DirectoryReloaderCrlProvider` we [call GetDefaultEventEngine](a58f3f2df5/src/core/lib/security/credentials/tls/grpc_tls_crl_provider.cc (L152)), and getting the default event engine requires that `grpc_init` is called.
This PR adds a test that catches the segfault and adds `grpc_init` and `grpc_shutdown` to the ctor and dtor of `DirectoryReloaderCrlProvider` so that the test passes.
Closes#35247
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35247 from gtcooke94:crl_provider_init_fix 25f3dc7f27
PiperOrigin-RevId: 589885254
Design is documented at
[go/windows-dns-resolver-issue](http://go/windows-dns-resolver-issue)
(note that the design doc is slightly outdated regarding the shared
ownership model of the virtual socket that was implemented in
13bd2b404e).
Passed `//test/cpp/naming:resolver_component_tests_runner_invoker` and
`//test/cpp/naming:cancel_ares_query_test`:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (1 packages loaded, 8 targets configured).
INFO: Found 1 test target...
INFO: From Compiling src/core/lib/event_engine/windows/windows_engine.cc:
C:\bazel6\execroot\com_github_grpc_grpc\src/core/lib/channel/channel_args.h(287): warning C4312: 'reinterpret_cast': conversion from 'int' to 'void *' of greater size
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 230.374s, Critical Path: 228.54s
INFO: 9 processes: 2 internal, 7 local.
INFO: Build completed successfully, 9 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 221.2s
Executed 1 out of 1 test: 1 test passes.
```
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:cancel_ares_query_test
INFO: Analyzed target //test/cpp/naming:cancel_ares_query_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:cancel_ares_query_test up-to-date:
bazel-bin/test/cpp/naming/cancel_ares_query_test.exe
INFO: Elapsed time: 49.656s, Critical Path: 48.00s
INFO: 6 processes: 2 internal, 4 local.
INFO: Build completed successfully, 6 total actions
//test/cpp/naming:cancel_ares_query_test PASSED in 43.0s
Executed 1 out of 1 test: 1 test passes.
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Bradley Hess <bdhess@google.com>
Co-authored-by: AJ Heller <hork@google.com>
- Fix deadlock in load reporting tests.
- Add timeout to `WaitForLoadReport()`. (Note: this required changing
from `grpc::internal::Mutex` and friends to `grpc_core::Mutex` and
friends.)
- Fix balancer stream shutdown machinery.
- Change `ServerThread` to be a class instead of a struct.
Changes to fake resolver:
- Add `WaitForReresolutionRequest()` method to fake resolver response
generator to allow tests to tell when re-resolution has been requested.
- Change fake resolver response generator API to have only one mechanism
for injecting results, regardless of whether the result is an error or
whether it's triggered by a re-resolution.
Changes to grpclb_end2end_test:
- Change balancer interface such that instead of setting a list of
responses with fixed delays, the test can control exactly when each
response is set.
- Change balancer impl to always send the initial LB response, as
expected by the grpclb protocol.
- Change balancer impl to always read load reports, even if load
reporting is not expected to be enabled. (The latter case will still
cause the test to fail.) Reads are done in a different thread than
writes.
- Allow each test to directly control how many backends and balancers
are started and the client load reporting interval, so that (a) we don't
waste resources starting servers we don't need and (b) there is no need
to arbitrarily split tests across different test classes.
- Add timeouts to `WaitForAllBackends()` functionality, so that tests
will fail with a useful error rather than timing out.
- Improved ergonomics of various helper functions in the test framework.
In the process of making these changes, I found a couple of bugs:
- A bug in pick_first, which I fixed in #34885.
- A bug in grpclb, in which we were using the wrong condition to decide
whether to propagate a re-resolution request from the child policy,
which I've fixed in this PR. (This bug probably originated way back in
#18344.)
This should address a lot of the flakes seen in grpclb_e2e_test
recently.
EventEngine experiments, especially with `work_serializer_dispatch` tend
to cause callbacks to occur later than we've previously seen, so tests
that verify global data structures tend to become flakier when these are
introduced.
Here, the fix is waiting for EventEngine to be closed before starting
the new test.
Whilst here, make some adjustments to the test for better readability on
what's going on:
- if we fail a request to an echo service, we do not actually expect the
messages to match, so don't report that
- if we expect a value of 1 or 2, AnyOf is a better tool: it will report
the actual value too
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.