Based on [OpenTelemetry Metrics gRFC](https://github.com/grpc/proposal/blob/master/A66-otel-stats.md#opentelemetry-metrics), we should recored unregistered RPC method name as `other`, this PR adds the ability to pass register method information when creating a call.
We'll consider calls created using generated stubs as registered, note that this won't prevent user from setting `registered_method=True` when creating calls manually.
This is also enabled for simple stub flow but **NOT enabled for AsyncIO**, we'll add that later when start working on AsyncIO Observability.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35002
PiperOrigin-RevId: 596719121
I appreciate the elegance of using `\` and `/` to create ASCII art, but trailing backslashes in a `//` comment generates a lot of warnings with some compilers:
```
INFO: From Compiling src/core/lib/channel/promise_based_filter.cc:
In file included from external/com_github_grpc_grpc/src/core/lib/surface/server.h:48,
from external/com_github_grpc_grpc/src/core/lib/surface/call.h:53,
from external/com_github_grpc_grpc/src/core/lib/channel/promise_based_filter.h:65,
from external/com_github_grpc_grpc/src/core/lib/channel/promise_based_filter.cc:17:
external/com_github_grpc_grpc/src/core/lib/channel/call_tracer.h:47:1: warning: multi-line comment [-Wcomment]
47 | // / \
| ^
external/com_github_grpc_grpc/src/core/lib/channel/call_tracer.h:49:1: warning: multi-line comment [-Wcomment]
49 | // / \
| ^
```
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35464
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35464 from coryan:coryan-patch-1 18a7f6b8e6
PiperOrigin-RevId: 596103770
It's not clear to me that this one unit test of very marginal importance warrants 8 bytes per channel.
Closes#35465
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35465 from ctiller:we-dont-need-this-really e7ee62ccb2
PiperOrigin-RevId: 596091614
Eventually for call-v3 we're going to want to have registration of filters generate the appropriate glue into the channel runtime to execute a call.
Begin that process now and gradually by introducing the new syntax and allowing a piecemeal migration to it - by the time we're done converting filters to the v3 APIs we'll also have the registration piece done.
PiperOrigin-RevId: 596013927
This should slightly increase per-channel memory but will eliminate some O(n^2) loops with large numbers of endpoints or addresses.
Closes#35445
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35445 from markdroth:lb_policies_remove_unnecessary_loop 94465f44ec
PiperOrigin-RevId: 596007480
Mirrors what we had with combiner, but allows it to occur at arbitrary points.
We'll use this in chaotic-good to:
1. combine fragments into a single frame
2. combine writes from different calls into a single syscall
Closes#35413
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35413 from ctiller:group-on 9f20f34523
PiperOrigin-RevId: 596004767
We encountered an api_fuzzer test case that adds a huge number of addresses that all immediately fail to connect, but it set max_backoff to 0, so there was a giant busy loop where pick_first was constantly trying to connect to subchannels with no delay. The FuzzingEventEngine was getting stuck in a tick loop, always accumulating more tasks that needed to be executed immediately, so it could never make forward progress on the test case.
This PR fixes the problem by adding a fixed 1us delay if the task's delay is 0 and the test case has not provided any more fixed delays.
(Unfortunately, I cannot include the test case that triggered the problem in this PR, because it winds up exceeding the RBE stdout limit.)
Fixes b/310664846.
Closes#35447
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35447 from markdroth:api_fuzzer_busy_loop_fix 90055d3d92
PiperOrigin-RevId: 595853516
There are a select few tests that are failing when building with OpenSSL102 - disable them until we can fix.
Closes#35354
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35354 from gtcooke94:fix_ossl_102 8708d6ce86
PiperOrigin-RevId: 595761932
This reverts commit 96b9e8d3e3.
[Implement OpenTelemetry PR](https://github.com/grpc/grpc/pull/35292) was [reverted](96b9e8d3e3) because some tests started failing after import the changes to g3.
After investigation, we found root cause, it can be fixed both on our side and on gapic API side, we opened an issue to [gapic API team](https://github.com/googleapis/python-api-core/issues/579), this PR will includes the fixes on our side.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35439
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35439 from XuanWang-Amos:reapply_otel 0133564438
PiperOrigin-RevId: 595746222
This a pretty common occurrence (e.g. if the peer has a SPIFFE cert) and is causing lots of log spam, see e.g. b/316690986.
Closes#35410
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35410 from matthewstevenson88:decrease-log-level e74f802114
PiperOrigin-RevId: 595531452
Continue supporting the current grpc-testing that I suppose is used
inside of Google, but also allow to configure a different project to
upload results to.
The format "project_id.dataset_id.table_id" is common for BigQuery so it
seems idiomatic to do it in this way. Adding a separate command line
option would be more complicated because it would require changes all
the way down the chain (at least in the entry point for the test driver
and in the LoadTest controller).
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35384
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35384 from lepistone:choose_bigquery_project 2355fea28c
PiperOrigin-RevId: 595523944
This PR adds CSM Observability testing capability in the PSM Interop testing framework. This PR mostly changes the framework Python code.
This adds a flag `enable_csm_observability` to the client / server deployment yaml file such that, when enabled, we will create a GMP `PodMonitoring` resource and pass the `--enable_csm_observability` to each language's client / server container (for them to actually enable the Prometheus endpoint)
I added a new test under `tests/csm/csm_observability_test.py`. This is basically a copy of the `tests/baseline_test.py` but with the `enable_csm_observability=True`.
Other PRs for this whole thing to work:
- https://github.com/grpc/grpc/pull/34752: The `PodMonitoring` resource yaml template
- https://github.com/grpc/grpc/pull/34832: Support for the `--enable_csm_observability` flag in the C++ client/server image
Closes#34835
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/34835 from stanley-cheung:csm-o11y-framework-changes 0b3d0eb7ed
PiperOrigin-RevId: 595502496
- `memory_pressure_controller` finally - allows deletion of pid_controller throughout the codebase
- `overload_protection` - one of the http2 rapid reset mitigations
- `red_max_concurrent_streams` - another http2 rapid reset mitigation
Closes#35426
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35426 from ctiller:new-years-cleanse 4651672e7e
PiperOrigin-RevId: 595205029
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35292
PiperOrigin-RevId: 595188404
Remove the old `switch` library - this used to be an implementation detail of `Seq`, `TrySeq` - but has become unused.
Add a new user facing primitive `Switch` that fills a similar role to `switch` in C++ - selecting a promise to execute based on a primitive discriminator - much like `If` allows selection based on a boolean discriminator now.
A future change will optimize this to actually lower the `Switch` into an actual `switch` statement, but for right now I want to get the functionality in.
Closes#35424
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35424 from ctiller:switchy 5308a914c6
PiperOrigin-RevId: 595140965
Whilst here, eliminate unnecessary mutexes and streamline some complexity in the read variants.
Closes#35409
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35409 from ctiller:pbe 4f9588101a
PiperOrigin-RevId: 595006455
Provide a public experimental API and bazel compatible build target for OpenTelemetry metrics.
Details -
* New `OpenTelemetryPluginBuilder` class that provides the API specified in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md
* The existing `grpc::internal::OpenTelemetryPluginBuilder` class is moved to `grpc::internal::OpenTelemetryPluginBuilderImpl` for disambiguation.
* Renamed `OTel` in some instances to `OpenTelemetry` for consistency.
Closes#35348
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35348 from yashykt:OTelPublicApi e32328825e
PiperOrigin-RevId: 594271246
Adds temporary `call.cc` and `connected_channel.cc` scaffolding to run `CallInterceptor`/`CallHandler` style calls.
This will get ripped out as soon as the v3 transition is completed.
Closes#35312
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35312 from ctiller:v3-accept ae0bf81f8b
PiperOrigin-RevId: 594128029
It turned out that the previous change missed two things which this PR
has
- Fix function `_dockerized_genrule` not to have `timeout` and `flaky`
which Bazel doesn't have. (Bazel 7 may either drop these arguments or
become more strict about passing unrecognized ones)
- Disabled python bazel distribution tests with Bazel 7. This needs to
be addressed by https://github.com/grpc/grpc/issues/35391.
Currently, each subchannel wrapper stores a ref to the policy and its key in the policy's subchannel map, and it looks up its entry in the map whenever it needs to modify that entry. There's some complexity due to the need to avoid deadlocks in the case where we remove the last strong ref to a subchannel wrapper from a map entry. This approach has a number of problems:
- The subchannel wrapper is dropping its key when it gets orphaned, meaning that it will *never* actually remove itself from the map entry when it is destroyed, which is not what we want. (This isn't actually causing a bug, but it does mean that we'll never delete the subchannel wrapper, even when it is really unused.)
- Having the subchannel wrapper look up its key in the map every time it needs to modfy its entry is fairly inefficient, especially if there are a large number of endpoints.
- There is a race condition that was accidentally introduced in #34472. The subchannel wrapper's key is being modified when the subchannel wrapper is orphaned, but that PR changed the picker to read the same value without any synchronization between the two, and we didn't notice the bug or catch it in any tests.
- The code is fairly hard to understand, with a bunch of special cases that are not obvious to the reader.
This PR addresses those problems by making the entries in the subchannel map be ref-counted, where a ref is held both by the map and by each subchannel wrapper. Specific changes:
- Because the wrapper holds a ref directly to the map entry, there is no longer any need for a map lookup every time the subchannel wrapper needs to access its map entry.
- We now avoid deadlocks by waiting until after we've released the lock to drop refs to subchannel wrappers, so there is no more need to modify the internal state of a subchannel wrapper.
- We now remove subchannel wrappers from the map entry when they are orphaned, so there is no longer any need to hold a weak ref in the map entry; instead, we now just use a raw pointer.
- The connectivity state is now stored in the map entry instead of in each individual subchannel wrapper. And we no longer need to use an atomic for it, since we are always holding the lock when it is accessed.
- All state guarded by the mutex (other than the subchannel map itself) is now in the subchannel entry, and I have added lock annotations so that the compiler can enforce the lock semantics.
This PR paves the way for subsequent work that will make SSA work across priorities (see in-progress [gRFC A75](https://github.com/grpc/proposal/pull/405)), where we will need to generalize the behavior such that we hold strong refs to subchannels in any state (not just DRAINING) when the child policy is not holding its own refs.
Closes#35379
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35379 from markdroth:xds_ssa_tsan_fix 4927e04eb1
PiperOrigin-RevId: 594015497
Rename `saved_errno` to `connect_errno`.
Avoid relying on `errno` being zero if `connect(2)` does not fail.
Slightly linearize control flow.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35356
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35356 from benjaminp:connect-errno 0dabdf0562
PiperOrigin-RevId: 593819124
- Fixed the bazel distrib tests with Bazel 7 by disabling bzlmod option.
- Added a new note for bzlmod to the doc.
Closes#35390
PiperOrigin-RevId: 593816700
- Added Bazel 7 to the support bazel versions.
- Changed the default Bazel version to 7.
- Fixed Android Binder build issue.
Closes#35362
PiperOrigin-RevId: 592946781
This is a prerequisite change to start supporting Bazel 7. Changes are
- Disabled bzlmod which Bazel 7 begins to enable by default. This eventually needs to be done to support bzlmod but not now.
- Upgraded some bazel rule dependencies which are required to support Bazel 7.
- Using Python 3 explcitly as Bazel 7 begins to reject Python 2.
Note that this isn't enough to enable Bazel 7 by default and another PR will follow for that.
Closes#35374
PiperOrigin-RevId: 592931675
Simple `assert` statements don't help much to know what needs to be done. Instead, explicit error messages will let us know what's wrong which is helpful to know what to look at.
Closes#35375
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35375 from veblush:check-work 0733499c31
PiperOrigin-RevId: 592920747