Add support for GCE resources in CSM Observability.
Additionally, fix a bug where we were not adding the remote workload's canonical service label for unknown resource types.
Also, if zone and region are both specified, zone takes precedence.
Closes#35371
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35371 from yashykt:GceSupportToCsm e3064d8c3c
PiperOrigin-RevId: 597989825
It looks like this ended up getting deleted in https://github.com/grpc/grpc/pull/34350 probably when merging.
Also, the `Init` method in the otel test library is getting unwieldy. I'm going to send out a follow-up PR to convert this into a builder instead.
Closes#35532
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35532 from yashykt:OTelPluginBuilderFix 372bf26338
PiperOrigin-RevId: 597846622
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35210
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35210 from yijiem:csm-service-label 6a6a7d1774
PiperOrigin-RevId: 597641393
Make sure there is no unnecessary delays when there are multiple reports in the queue.
This change also adds a test for the custom LB policy.
Closes#35467
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35467 from eugeneo:tasks/orca-test-timeout-316026521 4aab50a118
PiperOrigin-RevId: 597007131
It's not clear to me that this one unit test of very marginal importance warrants 8 bytes per channel.
Closes#35465
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35465 from ctiller:we-dont-need-this-really e7ee62ccb2
PiperOrigin-RevId: 596091614
There are a select few tests that are failing when building with OpenSSL102 - disable them until we can fix.
Closes#35354
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35354 from gtcooke94:fix_ossl_102 8708d6ce86
PiperOrigin-RevId: 595761932
Provide a public experimental API and bazel compatible build target for OpenTelemetry metrics.
Details -
* New `OpenTelemetryPluginBuilder` class that provides the API specified in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md
* The existing `grpc::internal::OpenTelemetryPluginBuilder` class is moved to `grpc::internal::OpenTelemetryPluginBuilderImpl` for disambiguation.
* Renamed `OTel` in some instances to `OpenTelemetry` for consistency.
Closes#35348
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35348 from yashykt:OTelPublicApi e32328825e
PiperOrigin-RevId: 594271246
This is a prerequisite change to start supporting Bazel 7. Changes are
- Disabled bzlmod which Bazel 7 begins to enable by default. This eventually needs to be done to support bzlmod but not now.
- Upgraded some bazel rule dependencies which are required to support Bazel 7.
- Using Python 3 explcitly as Bazel 7 begins to reject Python 2.
Note that this isn't enough to enable Bazel 7 by default and another PR will follow for that.
Closes#35374
PiperOrigin-RevId: 592931675
Previously, `RefCountedPtr<>` and `WeakRefCountedPtr<>` incorrectly allowed
implicit casting of any type to any other type. This hadn't caused a
problem until recently, but now that it has, we need to fix it. I have
fixed this by changing these smart pointer types to allow type
conversions only when the type used is convertible to the type of the
smart pointer. This means that if `Subclass` inherits from `Base`, then
we can set a `RefCountedPtr<BaseClass>` to a value of type
`RefCountedPtr<Subclass>`, but we cannot do the reverse.
We had been (ab)using this bug to make it more convenient to deal with
down-casting in subclasses of ref-counted types. For example, because
`Resolver` inherits from `InternallyRefCounted<Resolver>`, calling
`Ref()` on a subclass of `Resolver` will return `RefCountedPtr<Resolver>`
rather than returning the subclass's type. The ability to implicitly
convert to the subclass type made this a bit easier to deal with. Now
that that ability is gone, we need a different way of dealing with that
problem.
I considered several ways of dealing with this, but none of them are
quite as ergonomic as I would ideally like. For now, I've settled on
requiring callers to explicitly down-cast as needed, although I have
provided some utility functions to make this slightly easier:
- `RefCounted<>`, `InternallyRefCounted<>`, and `DualRefCounted<>` all
provide a templated `RefAsSubclass<>()` method that will return a new
ref as a subclass. The type used with `RefAsSubclass()` must be a
subclass of the type passed to `RefCounted<>`, `InternallyRefCounted<>`,
or `DualRefCounted<>`.
- In addition, `DualRefCounted<>` provides a templated `WeakRefAsSubclass<T>()`
method. This is the same as `RefAsSubclass()`, except that it returns
a weak ref instead of a strong ref.
- In `RefCountedPtr<>`, I have added a new `Ref()` method that takes
debug tracing parameters. This can be used instead of calling `Ref()`
on the underlying object in cases where the caller already has a
`RefCountedPtr<>` and is calling `Ref()` only to specify the debug
tracing parameters. Using this method on `RefCountedPtr<>` is more
ergonomic, because the smart pointer is already using the right
subclass, so no down-casting is needed.
- In `WeakRefCountedPtr<>`, I have added a new `WeakRef()` method that
takes debug tracing parameters. This is the same as the new `Ref()`
method on `RefCountedPtr<>`.
- In both `RefCountedPtr<>` and `WeakRefCountedPtr<>`, I have added a
templated `TakeAsSubclass<>()` method that takes the ref out of the
smart pointer and returns a new smart pointer of the down-casted type.
Just as with the `RefAsSubclass()` method above, the type used with
`TakeAsSubclass()` must be a subclass of the type passed to
`RefCountedPtr<>` or `WeakRefCountedPtr<>`.
Note that I have *not* provided an `AsSubclass<>()` variant of the
`RefIfNonZero()` methods. Those methods are used relatively rarely, so
it's not as important for them to be quite so ergonomic. Callers of
these methods that need to down-cast can use
`RefIfNonZero().TakeAsSubclass<>()`.
PiperOrigin-RevId: 592327447
The `DirectoryReloaderProvider` currently segfaults on construction if grpc_init() is not called before construction. This is because when creating the `DirectoryReloaderCrlProvider` we [call GetDefaultEventEngine](a58f3f2df5/src/core/lib/security/credentials/tls/grpc_tls_crl_provider.cc (L152)), and getting the default event engine requires that `grpc_init` is called.
This PR adds a test that catches the segfault and adds `grpc_init` and `grpc_shutdown` to the ctor and dtor of `DirectoryReloaderCrlProvider` so that the test passes.
Closes#35247
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35247 from gtcooke94:crl_provider_init_fix 25f3dc7f27
PiperOrigin-RevId: 589885254
Design is documented at
[go/windows-dns-resolver-issue](http://go/windows-dns-resolver-issue)
(note that the design doc is slightly outdated regarding the shared
ownership model of the virtual socket that was implemented in
13bd2b404e).
Passed `//test/cpp/naming:resolver_component_tests_runner_invoker` and
`//test/cpp/naming:cancel_ares_query_test`:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (1 packages loaded, 8 targets configured).
INFO: Found 1 test target...
INFO: From Compiling src/core/lib/event_engine/windows/windows_engine.cc:
C:\bazel6\execroot\com_github_grpc_grpc\src/core/lib/channel/channel_args.h(287): warning C4312: 'reinterpret_cast': conversion from 'int' to 'void *' of greater size
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 230.374s, Critical Path: 228.54s
INFO: 9 processes: 2 internal, 7 local.
INFO: Build completed successfully, 9 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 221.2s
Executed 1 out of 1 test: 1 test passes.
```
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:cancel_ares_query_test
INFO: Analyzed target //test/cpp/naming:cancel_ares_query_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:cancel_ares_query_test up-to-date:
bazel-bin/test/cpp/naming/cancel_ares_query_test.exe
INFO: Elapsed time: 49.656s, Critical Path: 48.00s
INFO: 6 processes: 2 internal, 4 local.
INFO: Build completed successfully, 6 total actions
//test/cpp/naming:cancel_ares_query_test PASSED in 43.0s
Executed 1 out of 1 test: 1 test passes.
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Bradley Hess <bdhess@google.com>
Co-authored-by: AJ Heller <hork@google.com>
- Fix deadlock in load reporting tests.
- Add timeout to `WaitForLoadReport()`. (Note: this required changing
from `grpc::internal::Mutex` and friends to `grpc_core::Mutex` and
friends.)
- Fix balancer stream shutdown machinery.
- Change `ServerThread` to be a class instead of a struct.
Changes to fake resolver:
- Add `WaitForReresolutionRequest()` method to fake resolver response
generator to allow tests to tell when re-resolution has been requested.
- Change fake resolver response generator API to have only one mechanism
for injecting results, regardless of whether the result is an error or
whether it's triggered by a re-resolution.
Changes to grpclb_end2end_test:
- Change balancer interface such that instead of setting a list of
responses with fixed delays, the test can control exactly when each
response is set.
- Change balancer impl to always send the initial LB response, as
expected by the grpclb protocol.
- Change balancer impl to always read load reports, even if load
reporting is not expected to be enabled. (The latter case will still
cause the test to fail.) Reads are done in a different thread than
writes.
- Allow each test to directly control how many backends and balancers
are started and the client load reporting interval, so that (a) we don't
waste resources starting servers we don't need and (b) there is no need
to arbitrarily split tests across different test classes.
- Add timeouts to `WaitForAllBackends()` functionality, so that tests
will fail with a useful error rather than timing out.
- Improved ergonomics of various helper functions in the test framework.
In the process of making these changes, I found a couple of bugs:
- A bug in pick_first, which I fixed in #34885.
- A bug in grpclb, in which we were using the wrong condition to decide
whether to propagate a re-resolution request from the child policy,
which I've fixed in this PR. (This bug probably originated way back in
#18344.)
This should address a lot of the flakes seen in grpclb_e2e_test
recently.
EventEngine experiments, especially with `work_serializer_dispatch` tend
to cause callbacks to occur later than we've previously seen, so tests
that verify global data structures tend to become flakier when these are
introduced.
Here, the fix is waiting for EventEngine to be closed before starting
the new test.
Whilst here, make some adjustments to the test for better readability on
what's going on:
- if we fail a request to an echo service, we do not actually expect the
messages to match, so don't report that
- if we expect a value of 1 or 2, AnyOf is a better tool: it will report
the actual value too
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.
Add the flag `enable_csm_observability` to the c++ PSM interop testing
image, such that when enabled from the PSM interop testing framework,
the C++ client/server app will enable the CSM Observability plugin.
Relands #34785, which was reverted in #34818.
The first commit is the revert. The second commit removes the gtest
dependency from the xds_server library, which should address the
testonly problem internally.
Currently it is very easy to use the `TlsCredentialsOptions` in such a
way that it produces a memory leak. For example, the code block
```
{
TlsCredentialsOptions options;
}
```
produces a memory leak. This PR fixes up the ownership bugs in this
class and its `grpc_tls_credentials_options`, the C-core analogue.
Earlier, the grpc message-length prefix for outgoing data messages was
incorrectly being counted towards `data_bytes` instead of
`framing_bytes`. This PR fixes it.
Note that the incoming stats collection properly attributes the grpc
message-length prefix to `framing_bytes`.
This change will affect all stats plugins (OpenCensus and OpenTelemetry)
that make use of this information for metrics.
Add TcpTracer interface for TCP instrumentation. It takes no gRPC
dependencies for use in external TCP implementations. Also add
HttpAnnotation for HTTP transport instrumentation using CallTracer.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This reverts commit 601aaf80b7, which
results in rolling forward #34408.
The 601aaf reversion happened because of a deadlock found in Python. The
root cause ended up being an issue with the Python wrapper and was fixed
in #34712 , so this can be rolled forward again
Ditch the old priority scheme for ordering filters, instead explicitly
mark up before/after constraints.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The basic APIs for the CRL Reloading features.
This adds external types to represent CRL Providers, CRLs, and
CertificateInfo.
Internally we will use `CrlImpl` - this layer is needed to hide OpenSSL
details from the user.
GRFC - https://github.com/grpc/proposal/pull/382
Things Done
* Add external API for `CrlProvider`, `Crl`, `CertInfo` (`CertInfo` is
used during CRL lookup rather than passing the entire certificate).
* Add code paths in `ssl_transport_security` to utilize CRL providers
* Add `StaticCrlProvider`
* Refactor `crl_ssl_transport_security_test.cc` so it is more extensible
and can be used with providers