This test is flaky, example failure: https://btx.cloud.google.com/invocations/ea6d91f6-655a-41a4-bcbd-96c1a75118e1/targets, error message:
```
[91mtests_aio.unit.call_test.TestStreamStreamCall.test_cancel_after_done_writing[0m
[1mtraceback:[0m
Traceback (most recent call last):
File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/var/local/git/grpc/src/python/grpcio_tests/tests_aio/unit/_test_base.py", line 31, in wrapper
return loop.run_until_complete(f(*args, **kwargs))
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/var/local/git/grpc/src/python/grpcio_tests/tests_aio/unit/call_test.py", line 821, in test_cancel_after_done_writing
self.assertTrue(call.cancel())
File "/usr/lib/python3.9/unittest/case.py", line 682, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
```
The test is trying to cancel the RPC after calling `done_writing()`, but it's possible that the RPC will finish before we checks the status and thus `call.cancel()` will return false.
We're changing this test to add some delays in server side so that we can properly cancel the RPC before it ends.
Tested locally and flake rate decreased from about 5/10000 to 0/10000.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38051
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38051 from XuanWang-Amos:fix_test_cancel_after_done_writing d72170c3de
PiperOrigin-RevId: 692266861
Protobuf 6.30.0 will change the return types of Descriptor::name() and other methods to absl::string_view. This makes the code work both before and after such a change.
PiperOrigin-RevId: 692216341
gRPC requires at least CMake 3.16 or later so the test script no longer needs to install CMake 3.16.
Closes#38048
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38048 from veblush:cmake-no 3e364d9770
PiperOrigin-RevId: 691976945
Certificate verification can fail for more than 50 different reasons. Simplify troubleshooting by including the reason in the error message.
Closes#37207
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37207 from csapuntz:master a8c0855f36
PiperOrigin-RevId: 691962134
The ring buffer was inconsistently removing items making it hard to reason about the state in general.
Additionally adjust API such that we'll guarantee concurrent one flush and fail to collect others. This ultimately makes it a lot easier to limit the impact of running this system.
PiperOrigin-RevId: 691657279
tl;dr: If data is received on the underlying endpoint, it must never be discarded.
Closes#38036
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38036 from drfloob:endpoint-cb-xor ab763cac30
PiperOrigin-RevId: 691508829
Fix for b/365993761.
Noticed that XdsClient metrics were not being reported due to authority not being properly set.
This solution is not perfect since channels created later can possibly use a different authority, so preferring to use the default authority from the first channel.
Closes#38009
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38009 from yashykt:AddAuthorityToXdsClientMetricsScope 00071efa23
PiperOrigin-RevId: 691149703
There was an edge case in which a socket or endpoint was shut down, a socket `read` call returned zero bytes, and there was unread in the read buffer from a previous read operation. The endpoint callbacks were called with an error status to indicate the end of the stream, and the callbacks did not consume that final chunk of data.
My current hunch is that something inside gRPC is violating the EventEngine Endpoint::Read contract, but I'm not certain what, yet. 88b5c9e3ab/include/grpc/event_engine/event_engine.h (L197-L199)
However, by modifying WindowsEndpoint to return an `absl::OkStatus()` if there's any data in the buffer, tests appear to pass.
Closes#38014
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38014 from drfloob:win-endpoint-data-leak b24b2d9f8a
PiperOrigin-RevId: 691063044
In EventEngineClientChannelDNSResolver, it needs to acquire `on_resolved_mu_` lock when calling dns resolver APIs as well as in on_resolve callback.
In DNSServiceResolverImpl::LookupHostname, it acquires `request_mu_` lock, so the lock order is:
EventEngineClientChannelDNSResolver::on_resolved_mu_ -> DNSServiceResolverImpl::request_mu_
Upon the resolution successful or failed, DNSServiceResolverImpl calls the on_resolved callback without holding any locks, so only one lock here:
EventEngineClientChannelDNSResolver::on_resolved_mu_
However when DNSServiceResolver was deleted, in DNSServiceResolverImpl::Shutdown, it calls the on_resolve callbacks while holding the `request_mu_` lock, as a result the lock order becomes:
DNSServiceResolverImpl::request_mu_ -> EventEngineClientChannelDNSResolver::on_resolved_mu_
which triggers the deadlock check in absl::Mutex as these two locks are acquired in different orders.
This PR release the on_resolved_mu_ first before calling on_resolve, so abls won't complain.
Added a new test which fails with GPR_ABSEIL_SYNC=1 (not enabled by default) without this PR.
Closes#38010
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38010 from HannahShiSFB:fix-lock-invert-in-dns-service-resolver-shutdown 5adc8f32b3
PiperOrigin-RevId: 690781049
`//test/core/end2end/...` tests stopped running back in March 2024, when the linking process for test binaries changed. See https://github.com/grpc/grpc/pull/36197.
The test targets ran on Windows, but zero tests were found inside those targets, so the tests succeeded instantly. This fix results in longer linking steps, and more disk space consumed, but tests are getting discovered now. To illustrate the problem, run `bazel test //test/core/end2end:ping_pong_streaming_test --test_output=all --test_arg=--gtest_list_tests`. Before this PR, zero tests were found. Now:
```
CoreEnd2endTest.
PingPongStreaming1/Inproc
PingPongStreaming1/Chttp2FakeSecurityFullstack
PingPongStreaming1/Chttp2Fullstack
PingPongStreaming1/Chttp2FullstackCompression
PingPongStreaming1/Chttp2FullstackLocalIpv4
...
```
Closes#37918
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37918 from drfloob:fix-win-core-e2e e60f832a5d
PiperOrigin-RevId: 690741492
Also remove a now-unnecessary dependency from the dump_args target that
previously had been needed just to get the right includes to be used.
PiperOrigin-RevId: 690619490
This will resolve a bunch of warnings we're seeing in Envoy, of the form:
```
INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/reflection/v1alpha/reflection.grpc.pb.h:
bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
INFO: From Action external/envoy_api/envoy/service/ext_proc/v3/external_processor.grpc.pb.h:
bazel-out/k8-opt/bin/external/envoy_api/external/envoy_api: warning: directory does not exist.
INFO: From Action external/opencensus_proto/opencensus/proto/agent/trace/v1/trace_service.grpc.pb.h:
bazel-out/k8-opt/bin/external/opencensus_proto/external/opencensus_proto: warning: directory does not exist.
INFO: From Action external/com_google_googleapis/google/devtools/cloudtrace/v2/trace.grpc.pb.h:
bazel-out/k8-opt/bin/external/com_google_googleapis/external/com_google_googleapis: warning: directory does not exist.
INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/reflection/v1/reflection.grpc.pb.h:
bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
```
```
bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
|----copy-1-from-dir_out-----||---copy-2-from-proto_root---|
```
Closes#37990
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37990 from asedeno:warnings 801da4c5cd
PiperOrigin-RevId: 689517449
It's not obvious to me what the reason for the increase in test time is, but it seems that we were already at ~3hr45min earlier and now we are timing out at 4 hours. Increasing to 6 hours.
Closes#37989
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37989 from yashykt:IncreaseWindowsTimeout a0daa97486
PiperOrigin-RevId: 689416681
This will disable the jobs in both bazel and cmake builds, which is necessary for our CI to remain happy. These end2end tests were enabled on Windows at some point without any notice (they had been intentionally disabled for a while), but the RBE jobs have been silently failing for 7 months, so it's unclear when that happened.
These failures still need to be examined so we can re-enable these tests.
Closes#37983
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37983 from drfloob:winguh 64a62fd5b9
PiperOrigin-RevId: 689118231
Possible fix for sporadic flakes like `detected corruption in /Volumes/BuildData/tmpfs/altsrc/github/grpc/workspace_ruby_macos_dbg_native/src/ruby/lib/grpc/grpc_c.bundle` we've been seeing in ruby.
`rake` (invoked by run_ruby.sh) "shouldn't" be modifying the binary if it's already been built (which is the case in these tests) but isn't really guaranteed not to. Running rspec directly ensures that we don't accidentally write to any pre-built artifacts while tests are possibly using them.
A side benefit here is to get better test reporting granularity.
Closes#37975
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37975 from apolcyn:ruby_flake_attempt c03931dfa4
PiperOrigin-RevId: 689064491
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37973
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37973 from yijiem:core-end2end-windows-hack a2da4ae4eb
PiperOrigin-RevId: 688727248
The default versions to run tests on master using `run_tests.py` are usually set to min and max supported Python versions. Looks like I missed updating the max version to Python 3.13 when adding support recently
Closes#37945
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37945 from sreenithi:add_missing_py313_test_config 708aa70f2b
PiperOrigin-RevId: 688702425
Three problems:
1. We have an owning waker, but on the `Expire` path we never wake it, leading to calls being stranded until the pending timer runs out - instead we now call Finish and have it always wake things up (slightly more expensive in shutdown case, but not on the fast path)
2. Avoid a race condition whereby two threads could wake the same waker
3. Don't add new requests to the pending queue after we've removed all requests
Closes#37972
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37972 from ctiller:flake-fightas-21 2bbd1cf667
PiperOrigin-RevId: 688310530
I hit this crash working on fork support but there's a chance it happens if the file descriptor becomes bad for some other reason.
Closes#37952
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37952 from eugeneo:listener-no-address-crash 1e2e82f8e6
PiperOrigin-RevId: 688238823
Skipping the `tests_aio.unit.channel_ready_test.TestChannelReady.channel_ready_blocked` unit test due to a flake of the test not timing out and raising TimeoutError as expected
Can be reverted once #37949 is investigated and fixed
Closes#37948
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37948 from sreenithi:check_aio_channel_ready_test f48edea5f2
PiperOrigin-RevId: 688061272