Fix for b/365993761.
Noticed that XdsClient metrics were not being reported due to authority not being properly set.
This solution is not perfect since channels created later can possibly use a different authority, so preferring to use the default authority from the first channel.
Closes#38009
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38009 from yashykt:AddAuthorityToXdsClientMetricsScope 00071efa23
PiperOrigin-RevId: 691149703
There was an edge case in which a socket or endpoint was shut down, a socket `read` call returned zero bytes, and there was unread in the read buffer from a previous read operation. The endpoint callbacks were called with an error status to indicate the end of the stream, and the callbacks did not consume that final chunk of data.
My current hunch is that something inside gRPC is violating the EventEngine Endpoint::Read contract, but I'm not certain what, yet. 88b5c9e3ab/include/grpc/event_engine/event_engine.h (L197-L199)
However, by modifying WindowsEndpoint to return an `absl::OkStatus()` if there's any data in the buffer, tests appear to pass.
Closes#38014
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38014 from drfloob:win-endpoint-data-leak b24b2d9f8a
PiperOrigin-RevId: 691063044
In EventEngineClientChannelDNSResolver, it needs to acquire `on_resolved_mu_` lock when calling dns resolver APIs as well as in on_resolve callback.
In DNSServiceResolverImpl::LookupHostname, it acquires `request_mu_` lock, so the lock order is:
EventEngineClientChannelDNSResolver::on_resolved_mu_ -> DNSServiceResolverImpl::request_mu_
Upon the resolution successful or failed, DNSServiceResolverImpl calls the on_resolved callback without holding any locks, so only one lock here:
EventEngineClientChannelDNSResolver::on_resolved_mu_
However when DNSServiceResolver was deleted, in DNSServiceResolverImpl::Shutdown, it calls the on_resolve callbacks while holding the `request_mu_` lock, as a result the lock order becomes:
DNSServiceResolverImpl::request_mu_ -> EventEngineClientChannelDNSResolver::on_resolved_mu_
which triggers the deadlock check in absl::Mutex as these two locks are acquired in different orders.
This PR release the on_resolved_mu_ first before calling on_resolve, so abls won't complain.
Added a new test which fails with GPR_ABSEIL_SYNC=1 (not enabled by default) without this PR.
Closes#38010
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38010 from HannahShiSFB:fix-lock-invert-in-dns-service-resolver-shutdown 5adc8f32b3
PiperOrigin-RevId: 690781049
`//test/core/end2end/...` tests stopped running back in March 2024, when the linking process for test binaries changed. See https://github.com/grpc/grpc/pull/36197.
The test targets ran on Windows, but zero tests were found inside those targets, so the tests succeeded instantly. This fix results in longer linking steps, and more disk space consumed, but tests are getting discovered now. To illustrate the problem, run `bazel test //test/core/end2end:ping_pong_streaming_test --test_output=all --test_arg=--gtest_list_tests`. Before this PR, zero tests were found. Now:
```
CoreEnd2endTest.
PingPongStreaming1/Inproc
PingPongStreaming1/Chttp2FakeSecurityFullstack
PingPongStreaming1/Chttp2Fullstack
PingPongStreaming1/Chttp2FullstackCompression
PingPongStreaming1/Chttp2FullstackLocalIpv4
...
```
Closes#37918
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37918 from drfloob:fix-win-core-e2e e60f832a5d
PiperOrigin-RevId: 690741492
This will disable the jobs in both bazel and cmake builds, which is necessary for our CI to remain happy. These end2end tests were enabled on Windows at some point without any notice (they had been intentionally disabled for a while), but the RBE jobs have been silently failing for 7 months, so it's unclear when that happened.
These failures still need to be examined so we can re-enable these tests.
Closes#37983
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37983 from drfloob:winguh 64a62fd5b9
PiperOrigin-RevId: 689118231
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37973
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37973 from yijiem:core-end2end-windows-hack a2da4ae4eb
PiperOrigin-RevId: 688727248
Porting from #37829.
This ensures that we wait to create the stream to the handshaker service until handshake frames arrive from the client. Without this change, a TCP connection to the ALTS server triggers the stream to the handshaker service to be created, even if no handshake frames have arrived from the client. This waste resources and can potentially trigger the ALTS server to freeze up, because there is a cap on the number of concurrent ALTS handshakes that a server can perform.
Closes#37961
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37961 from matthewstevenson88:alts-fix f8f07e59bb
PiperOrigin-RevId: 687977457
Don't complete writes of messages until they make it to the transports outbound loop. Since payloads could be large this introduces just enough pushback that, once #37868 goes in also we should be able to sense when a transport is busy writing and stop sending at higher layers.
Closes#37894
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37894 from ctiller:send-acked 0cb3d7f8ad
PiperOrigin-RevId: 686689473
This is missing in v3 vs v2
- in v2 we had Pipe setup so that multiple Pipe stages could be chained and only complete when the last stage had passed flow control, whereas in v3 the top stage will start accepting requests as soon as the first stage in the pipeline takes the message.
Closes#37868
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37868 from ctiller:drizzling 69209da8a7
PiperOrigin-RevId: 686652402
To prepare for the upcoming upgrade to C++17, the following changes were made:
Increased minimum supported operating system versions:
- iOS: 11 (previously 10)
- macOS: 10.14 (previously 10.12)
- tvOS: 13.0 (previously 12.0)
In addition to this, version requirements across different projects were updated to use these for consistency.
Closes#37931
PiperOrigin-RevId: 686519641
This migrates all of the xDS unit tests except for the fuzzer, which I'll get in a subsequent PR.
This also does not include the xDS e2e tests, which I will also do separately.
Closes#37896
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37896 from markdroth:xds_tests_use_real_protos2 de568b4e53
PiperOrigin-RevId: 686197812
This eliminates the need for the `grpc_cc_proto_library` bazel BUILD rule introduced in #37863.
To make this work, I had to upgrade several bazel dependencies and apply a patch to rules_go to work around https://github.com/bazelbuild/bazel/issues/11636.
Closes#37902
PiperOrigin-RevId: 685868647
Instead of getting value of `csm_mesh_id` from the bootstrap file, get it from the env var `CSM_MESH_ID`
Closes#37801
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37801 from yashykt:CsmMeshIdChange d0f149e023
PiperOrigin-RevId: 685864223
There's a timeout flake when running windows tests for the Static CRL Provider, let's see if increasing the timeout helps
Closes#37915
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37915 from gtcooke94:increase_test_deadline 0386aab786
PiperOrigin-RevId: 685752852
These look too large for the configured timeouts internally... will revisit later once the system is starting to be more ready.
Closes#37905
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37905 from ctiller:lol-nope 6d44e515b3
PiperOrigin-RevId: 685223450
We'll probably disable some next week :)
But I want to watch a good selection and refine criteria for acceptance.
Closes#37903
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37903 from ctiller:ALL-the-things 5f829db870
PiperOrigin-RevId: 685010911
So far missing for HTTP/2 style flow control has been a primitive to query whether there's a receiver for flow control data at the other end of the message pipes.
Here I'm updating the state machine accessors to accommodate that functionality.
No new states were needed.
Whilst here, document the current member functions on `CallState`.
Closes#37867
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37867 from ctiller:like-the-river c9814c737d
PiperOrigin-RevId: 684972125
This is a trial baloon to see if we can actually make this work. If it does, I'll change the remaining xDS tests to use the real xDS protos and completely remove our local copies.
Closes#37863
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37863 from markdroth:xds_tests_use_real_protos 3ad2fe12be
PiperOrigin-RevId: 684877750
I'm continuing to look into some flakes here, but in the meantime these shouldn't halt submissions. Marking them flaky.
Closes#37880
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37880 from ctiller:mark-flaky 27427c7978
PiperOrigin-RevId: 684526341
I'm unable to reproduce some of the flakiness here. Enabling tracers to get more information.
Closes#37875
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37875 from yashykt:MoreLogsInXdsE2ETest 4a7eb67202
PiperOrigin-RevId: 684201791
With the CL-first approach, the docker test configs for Binder need to be deleted before the Binder code and tests themselves can be deleted in the next step. Sanity checks fail otherwise.
Closes#37862
PiperOrigin-RevId: 683691175
In some rare occasions on Win machines (0,3-0,4%), the tests are stuck when we execute the loop of 10 DoRpc calls. We receive Deadline Exceeded for such cases. The PR bumps the deadline from 10 to 60s (no flakes for --runs_per_test=10000).
Closes#37844
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37844 from erm-g:seqFix 8644db8194
PiperOrigin-RevId: 681891281