Speculative attempt to fix a failing test. Hypothesis: UB on destroyed buffer when the read callbacks were executed.
Closes#38085
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38085 from drfloob:iocp-test-cleanup ae89836762
PiperOrigin-RevId: 694187408
This removes all xDS protos except for 5 of them that have services. We still have some limitations in our internal build system that make it hard to use the real xDS protos for those files, but we're now using the real xDS protos for the rest.
(Note: discovery.proto is actually a special case. While it does have services, we don't actually use those services, so that's not the reason we need a copy of this file. Unfortunately, the xDS BUILD files group discovery.proto into the same build target as ads.proto, which has services that we actually use, thus requiring us to have our own copy. This means that depending on the real discovery.proto causes us to also depend on the real ads.proto, which causes a conflict in the protobuf registry by linking two copies of ads.proto. However, we *are* using the real discovery.proto in unit tests, which do not depend on ads.proto.)
PiperOrigin-RevId: 693907782
The target `:default_event_engine_factory` currently uses gRPC specific config_settings which bundle the CPU with the OS (e.g. `cpu: windows_x86_64`). Use Bazel's OS constraint as one of the select cases so the correct target is used when settings `os: windows` as a constraint.
PiperOrigin-RevId: 693890511
We've got a customer that's seeing some failures right now and are stuck debugging because we don't have sufficient log visibility.
Closes#38065
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38065 from ctiller:loggy 8df1d8a4bb
PiperOrigin-RevId: 693879687
ResolvedAddrToUnixPathIfPossible is only called when GRPC_HAVE_UNIX_SOCKET is defined, so there's no need to define that function when it isn't.
Closes#38016
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38016 from hferreiro:master 3d00260104
PiperOrigin-RevId: 693833757
This change upgrades the sanity test to use Clang 19, including clang-format and clang-tidy. (It's a partial implementation of the changes proposed in #38038)
Key updates:
- Docker images now utilize Clang 19.
- Code has been reformatted using the updated clang-format.
- Resolved `readability-math-missing-parentheses` warnings raised by clang-tidy.
Note that the other part of the clang-19 upgrade, "using clang-19 for C++ test" will be done once opentelemetry-cpp fixes the clang-19 build error.
Closes#38070
PiperOrigin-RevId: 693833548
`//src/python/grpcio_tests/tests/unit:_contextvars_propagation_test` is very flaky, mainly in two ways:
1. Failing with error `Error in bind for address '/tmp/grpc_fullstack_test.sock': Address already in use`.
2. Failing with timeout without any error.
#### Address already in use error
This is because we're reusing the same path for all test cases: 5011420f16/src/python/grpcio_tests/tests/unit/_contextvars_propagation_test.py (L31)
#### Timeout error
We're deleting tmp file after test is done:
5011420f16/src/python/grpcio_tests/tests/unit/_contextvars_propagation_test.py (L64-L66)
This might cause Core fail to connect to channel with error: `connect failed: addr: unix:/tmp/grpc_fullstack_test.sock error: No such file or directory`, Core will keep retrying and thus causing the test to timeout.
To make things worse, we're using multiple threads in one of the test case, leading to an even higher rate of flakiness.
This PR fix the issue by using different address for different test runs.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38076
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38076 from XuanWang-Amos:fix_contextvar_test 93ab2b350f
PiperOrigin-RevId: 693812629
This reverts commit 574b19ec31.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38074
PiperOrigin-RevId: 693803071
This log can be hit under normal circumstances (e.g. a client has an expired cert and authenticates to the server), so this should be an INFO-level log rather than an ERROR-level log.
Closes#38058
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38058 from matthewstevenson88:downgrade23 1cbdd5a3e7
PiperOrigin-RevId: 693375018
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38056
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38056 from yijiem:ee-dns-non-client-channel-chaotic 74c3b20731
PiperOrigin-RevId: 693112346
This might not be needed? (From https://github.com/grpc/grpc/issues/37976)
- Boringssl commit requiring -msse2: 56d3ad9d23 (diff-f628d148f94bbab9e22a1ad426ccd94311f1fb87bbe5cc533cb85aee18b07a20R248)
- gRPC change to add -msse2 https://github.com/grpc/grpc/pull/36089
---
Answer to the quesiton above is yes; The -msse2 option remains necessary for gRPC on i686 due to BoringSSL's requirements. However, the existing CMake condition for this option was too broad, potentially including ARM architectures where SSE2 isn't supported, leading to compilation errors. I've refined the condition to specifically target 32-bit x86 architectures.
Furthermore, to ensure accurate architecture detection within our dockerized tests, I've configured x86 tests to utilize the linux32 command. This ensures that uname -a correctly reports i686, allowing gRPC's CMake to identify the architecture and apply the -msse2 option as needed.
It's important to note that RBE overrides the default entrypoint, so RBE-based tests must explicitly invoke linux32 even if the Docker image already has it set.
Fixes https://github.com/grpc/grpc/issues/37976Closes#38024
PiperOrigin-RevId: 693026079
We're about to completely change the wire format here... land one additional copy of the transport and tests as a hedge against bugs. Enable the hedge with an experiment.
Closes#38026
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38026 from ctiller:legacy-admission 5a32bb105d
PiperOrigin-RevId: 692984545
Improve metadata redaction comment to help people who are seeing the redaction statement in their logs.
Closes#38033
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38033 from tanvi-jagtap:improve_redaction_comment 18ba7e18c9
PiperOrigin-RevId: 692906001
[PH2][NewFile][ClassStructure][Important] Add client and server class
1. New classes Http2ServerTransport and Http2ClientTransport
2. Similar to the classes in [Chaotic Good Client Transport](https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chaotic_good/client_transport.h) and [Chaotic Good Server Transport](https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chaotic_good/server_transport.h)
3. Added new Test files. For now, the 2 new tests just call the constructor of Http2ServerTransport and Http2ClientTransport.
Tested locally using
```
CC=cc bazel test --test_output=all -c dbg --config=asan --verbose_failures //test/core/transport/chttp2:http2_client_transport_test
```
```
CC=cc bazel test --test_output=all -c dbg --config=asan --verbose_failures //test/core/transport/chttp2:http2_server_transport_test
```
Closes#37840
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37840 from tanvi-jagtap:ph2_add_client_server_class c6c3a0d5fb
PiperOrigin-RevId: 692824127
The Windows RBE test has been using Bazel 7.3.1 for running tests.
However, the RBE configuration itself was built with an older Bazel
version (6.3.2). While this hasn't caused any issues so far, it's best
to use the same Bazel version (7.3.1) for both building the RBE
configuration and running tests to ensure consistency and avoid
potential problems in the future.
Related to https://github.com/grpc/grpc/pull/37987
This test is flaky, example failure: https://btx.cloud.google.com/invocations/ea6d91f6-655a-41a4-bcbd-96c1a75118e1/targets, error message:
```
[91mtests_aio.unit.call_test.TestStreamStreamCall.test_cancel_after_done_writing[0m
[1mtraceback:[0m
Traceback (most recent call last):
File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/lib/python3.9/unittest/case.py", line 593, in run
self._callTestMethod(testMethod)
File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
method()
File "/var/local/git/grpc/src/python/grpcio_tests/tests_aio/unit/_test_base.py", line 31, in wrapper
return loop.run_until_complete(f(*args, **kwargs))
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/var/local/git/grpc/src/python/grpcio_tests/tests_aio/unit/call_test.py", line 821, in test_cancel_after_done_writing
self.assertTrue(call.cancel())
File "/usr/lib/python3.9/unittest/case.py", line 682, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
```
The test is trying to cancel the RPC after calling `done_writing()`, but it's possible that the RPC will finish before we checks the status and thus `call.cancel()` will return false.
We're changing this test to add some delays in server side so that we can properly cancel the RPC before it ends.
Tested locally and flake rate decreased from about 5/10000 to 0/10000.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38051
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38051 from XuanWang-Amos:fix_test_cancel_after_done_writing d72170c3de
PiperOrigin-RevId: 692266861
Protobuf 6.30.0 will change the return types of Descriptor::name() and other methods to absl::string_view. This makes the code work both before and after such a change.
PiperOrigin-RevId: 692216341
gRPC requires at least CMake 3.16 or later so the test script no longer needs to install CMake 3.16.
Closes#38048
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38048 from veblush:cmake-no 3e364d9770
PiperOrigin-RevId: 691976945
Certificate verification can fail for more than 50 different reasons. Simplify troubleshooting by including the reason in the error message.
Closes#37207
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37207 from csapuntz:master a8c0855f36
PiperOrigin-RevId: 691962134
The ring buffer was inconsistently removing items making it hard to reason about the state in general.
Additionally adjust API such that we'll guarantee concurrent one flush and fail to collect others. This ultimately makes it a lot easier to limit the impact of running this system.
PiperOrigin-RevId: 691657279
tl;dr: If data is received on the underlying endpoint, it must never be discarded.
Closes#38036
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38036 from drfloob:endpoint-cb-xor ab763cac30
PiperOrigin-RevId: 691508829
Fix for b/365993761.
Noticed that XdsClient metrics were not being reported due to authority not being properly set.
This solution is not perfect since channels created later can possibly use a different authority, so preferring to use the default authority from the first channel.
Closes#38009
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38009 from yashykt:AddAuthorityToXdsClientMetricsScope 00071efa23
PiperOrigin-RevId: 691149703
There was an edge case in which a socket or endpoint was shut down, a socket `read` call returned zero bytes, and there was unread in the read buffer from a previous read operation. The endpoint callbacks were called with an error status to indicate the end of the stream, and the callbacks did not consume that final chunk of data.
My current hunch is that something inside gRPC is violating the EventEngine Endpoint::Read contract, but I'm not certain what, yet. 88b5c9e3ab/include/grpc/event_engine/event_engine.h (L197-L199)
However, by modifying WindowsEndpoint to return an `absl::OkStatus()` if there's any data in the buffer, tests appear to pass.
Closes#38014
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38014 from drfloob:win-endpoint-data-leak b24b2d9f8a
PiperOrigin-RevId: 691063044