We encountered an api_fuzzer test case that adds a huge number of addresses that all immediately fail to connect, but it set max_backoff to 0, so there was a giant busy loop where pick_first was constantly trying to connect to subchannels with no delay. The FuzzingEventEngine was getting stuck in a tick loop, always accumulating more tasks that needed to be executed immediately, so it could never make forward progress on the test case.
This PR fixes the problem by adding a fixed 1us delay if the task's delay is 0 and the test case has not provided any more fixed delays.
(Unfortunately, I cannot include the test case that triggered the problem in this PR, because it winds up exceeding the RBE stdout limit.)
Fixes b/310664846.
Closes#35447
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35447 from markdroth:api_fuzzer_busy_loop_fix 90055d3d92
PiperOrigin-RevId: 595853516
There are a select few tests that are failing when building with OpenSSL102 - disable them until we can fix.
Closes#35354
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35354 from gtcooke94:fix_ossl_102 8708d6ce86
PiperOrigin-RevId: 595761932
This reverts commit 96b9e8d3e3.
[Implement OpenTelemetry PR](https://github.com/grpc/grpc/pull/35292) was [reverted](96b9e8d3e3) because some tests started failing after import the changes to g3.
After investigation, we found root cause, it can be fixed both on our side and on gapic API side, we opened an issue to [gapic API team](https://github.com/googleapis/python-api-core/issues/579), this PR will includes the fixes on our side.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35439
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35439 from XuanWang-Amos:reapply_otel 0133564438
PiperOrigin-RevId: 595746222
This a pretty common occurrence (e.g. if the peer has a SPIFFE cert) and is causing lots of log spam, see e.g. b/316690986.
Closes#35410
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35410 from matthewstevenson88:decrease-log-level e74f802114
PiperOrigin-RevId: 595531452
Continue supporting the current grpc-testing that I suppose is used
inside of Google, but also allow to configure a different project to
upload results to.
The format "project_id.dataset_id.table_id" is common for BigQuery so it
seems idiomatic to do it in this way. Adding a separate command line
option would be more complicated because it would require changes all
the way down the chain (at least in the entry point for the test driver
and in the LoadTest controller).
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35384
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35384 from lepistone:choose_bigquery_project 2355fea28c
PiperOrigin-RevId: 595523944
This PR adds CSM Observability testing capability in the PSM Interop testing framework. This PR mostly changes the framework Python code.
This adds a flag `enable_csm_observability` to the client / server deployment yaml file such that, when enabled, we will create a GMP `PodMonitoring` resource and pass the `--enable_csm_observability` to each language's client / server container (for them to actually enable the Prometheus endpoint)
I added a new test under `tests/csm/csm_observability_test.py`. This is basically a copy of the `tests/baseline_test.py` but with the `enable_csm_observability=True`.
Other PRs for this whole thing to work:
- https://github.com/grpc/grpc/pull/34752: The `PodMonitoring` resource yaml template
- https://github.com/grpc/grpc/pull/34832: Support for the `--enable_csm_observability` flag in the C++ client/server image
Closes#34835
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/34835 from stanley-cheung:csm-o11y-framework-changes 0b3d0eb7ed
PiperOrigin-RevId: 595502496
- `memory_pressure_controller` finally - allows deletion of pid_controller throughout the codebase
- `overload_protection` - one of the http2 rapid reset mitigations
- `red_max_concurrent_streams` - another http2 rapid reset mitigation
Closes#35426
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35426 from ctiller:new-years-cleanse 4651672e7e
PiperOrigin-RevId: 595205029
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35292
PiperOrigin-RevId: 595188404
Remove the old `switch` library - this used to be an implementation detail of `Seq`, `TrySeq` - but has become unused.
Add a new user facing primitive `Switch` that fills a similar role to `switch` in C++ - selecting a promise to execute based on a primitive discriminator - much like `If` allows selection based on a boolean discriminator now.
A future change will optimize this to actually lower the `Switch` into an actual `switch` statement, but for right now I want to get the functionality in.
Closes#35424
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35424 from ctiller:switchy 5308a914c6
PiperOrigin-RevId: 595140965
Whilst here, eliminate unnecessary mutexes and streamline some complexity in the read variants.
Closes#35409
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35409 from ctiller:pbe 4f9588101a
PiperOrigin-RevId: 595006455
Provide a public experimental API and bazel compatible build target for OpenTelemetry metrics.
Details -
* New `OpenTelemetryPluginBuilder` class that provides the API specified in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md
* The existing `grpc::internal::OpenTelemetryPluginBuilder` class is moved to `grpc::internal::OpenTelemetryPluginBuilderImpl` for disambiguation.
* Renamed `OTel` in some instances to `OpenTelemetry` for consistency.
Closes#35348
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35348 from yashykt:OTelPublicApi e32328825e
PiperOrigin-RevId: 594271246
Adds temporary `call.cc` and `connected_channel.cc` scaffolding to run `CallInterceptor`/`CallHandler` style calls.
This will get ripped out as soon as the v3 transition is completed.
Closes#35312
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35312 from ctiller:v3-accept ae0bf81f8b
PiperOrigin-RevId: 594128029
It turned out that the previous change missed two things which this PR
has
- Fix function `_dockerized_genrule` not to have `timeout` and `flaky`
which Bazel doesn't have. (Bazel 7 may either drop these arguments or
become more strict about passing unrecognized ones)
- Disabled python bazel distribution tests with Bazel 7. This needs to
be addressed by https://github.com/grpc/grpc/issues/35391.
Currently, each subchannel wrapper stores a ref to the policy and its key in the policy's subchannel map, and it looks up its entry in the map whenever it needs to modify that entry. There's some complexity due to the need to avoid deadlocks in the case where we remove the last strong ref to a subchannel wrapper from a map entry. This approach has a number of problems:
- The subchannel wrapper is dropping its key when it gets orphaned, meaning that it will *never* actually remove itself from the map entry when it is destroyed, which is not what we want. (This isn't actually causing a bug, but it does mean that we'll never delete the subchannel wrapper, even when it is really unused.)
- Having the subchannel wrapper look up its key in the map every time it needs to modfy its entry is fairly inefficient, especially if there are a large number of endpoints.
- There is a race condition that was accidentally introduced in #34472. The subchannel wrapper's key is being modified when the subchannel wrapper is orphaned, but that PR changed the picker to read the same value without any synchronization between the two, and we didn't notice the bug or catch it in any tests.
- The code is fairly hard to understand, with a bunch of special cases that are not obvious to the reader.
This PR addresses those problems by making the entries in the subchannel map be ref-counted, where a ref is held both by the map and by each subchannel wrapper. Specific changes:
- Because the wrapper holds a ref directly to the map entry, there is no longer any need for a map lookup every time the subchannel wrapper needs to access its map entry.
- We now avoid deadlocks by waiting until after we've released the lock to drop refs to subchannel wrappers, so there is no more need to modify the internal state of a subchannel wrapper.
- We now remove subchannel wrappers from the map entry when they are orphaned, so there is no longer any need to hold a weak ref in the map entry; instead, we now just use a raw pointer.
- The connectivity state is now stored in the map entry instead of in each individual subchannel wrapper. And we no longer need to use an atomic for it, since we are always holding the lock when it is accessed.
- All state guarded by the mutex (other than the subchannel map itself) is now in the subchannel entry, and I have added lock annotations so that the compiler can enforce the lock semantics.
This PR paves the way for subsequent work that will make SSA work across priorities (see in-progress [gRFC A75](https://github.com/grpc/proposal/pull/405)), where we will need to generalize the behavior such that we hold strong refs to subchannels in any state (not just DRAINING) when the child policy is not holding its own refs.
Closes#35379
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35379 from markdroth:xds_ssa_tsan_fix 4927e04eb1
PiperOrigin-RevId: 594015497
Rename `saved_errno` to `connect_errno`.
Avoid relying on `errno` being zero if `connect(2)` does not fail.
Slightly linearize control flow.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35356
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35356 from benjaminp:connect-errno 0dabdf0562
PiperOrigin-RevId: 593819124
- Fixed the bazel distrib tests with Bazel 7 by disabling bzlmod option.
- Added a new note for bzlmod to the doc.
Closes#35390
PiperOrigin-RevId: 593816700
- Added Bazel 7 to the support bazel versions.
- Changed the default Bazel version to 7.
- Fixed Android Binder build issue.
Closes#35362
PiperOrigin-RevId: 592946781
This is a prerequisite change to start supporting Bazel 7. Changes are
- Disabled bzlmod which Bazel 7 begins to enable by default. This eventually needs to be done to support bzlmod but not now.
- Upgraded some bazel rule dependencies which are required to support Bazel 7.
- Using Python 3 explcitly as Bazel 7 begins to reject Python 2.
Note that this isn't enough to enable Bazel 7 by default and another PR will follow for that.
Closes#35374
PiperOrigin-RevId: 592931675
Simple `assert` statements don't help much to know what needs to be done. Instead, explicit error messages will let us know what's wrong which is helpful to know what to look at.
Closes#35375
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35375 from veblush:check-work 0733499c31
PiperOrigin-RevId: 592920747
Due to an internal issue, some code from objective-c folder was copied
into objective_c .
Trying to undo that.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Fix: https://github.com/grpc/grpc/issues/35085
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35325
PiperOrigin-RevId: 592635611
Fixes#34929.
This PR hash-pins all Actions used in workflows and sets up dependabot
to keep them up-to-date.
Dependabot will send at most one PR per month. That PR will update the
hashes and version comments of all Actions with new versions.
I also suggest you enable Dependabot Security Updates in the repo's
[Code security &
analysis](https://github.com/grpc/grpc/settings/security_analysis)
settings (if you haven't already). This will make Dependabot send a PR
as soon as a dependency is found to have a vulnerability.
---------
Signed-off-by: Pedro Kaj Kjellerup Nacht <pnacht@google.com>
The Server Reflection Protocol v1 is already released. I think v1 is a better link to the Protocol, not v1alpha now.
Closes#35330
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35330 from y-yagi:patch-1 a89cb60b1e
PiperOrigin-RevId: 592356694
`grpc_tcp_client_create_from_prepared_fd` distinguishes "in-progress" `connect(2)` errors from fatal errors. However, it does a bunch of external calls between calling `connect(2`) and checking the `errno`. These calls may not preserve `errno`.
This change parallels defensive `errno` saving pattern in the event_engine.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35064
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35064 from benjaminp:save-errno a01f6b4309
PiperOrigin-RevId: 592350454