Builds upon #37765 to support arbitrary connection counts in the transport.
(note: at this point the number of connections is determined at connection establishment - future work will be autotuning this)
Closes#38032
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38032 from ctiller:tiefling-buffer c7520fd7a9
PiperOrigin-RevId: 698952890
These corpora entries helped isolate a number of bugs in cancel_after_invoke.
By themselves right now I don't expect them to do much, but I want to seed our upstream fuzzers with this data so that we can find new examples in the future.
Closes#38132
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38132 from ctiller:flake-fightas-23 9f6ae727f8
PiperOrigin-RevId: 697064247
Update the chaotic-good wire format with some learnings from the past year, and set up things for the next round of changes we'd like to make:
* Instead of a composite FRAGMENT frame, split out CLIENT_INITIAL_METADATA, CLIENT_END_OF_STREAM, MESSAGE, SERVER_INITIAL_METADATA, SERVER_TRAILING_METADATA as separate frame types - this eliminates a ton of complexity in the transport, and corresponds to how we used the wire format in practice anyway.
* Switch the frame payload for metadata, settings to be protobuf instead of HPACK - this eliminates the ordering requirements on interpreting these frames between streams, which I expect to open up some flexibility with head of line avoidance in the future. It's a heck of a lot easier to read and reason about the code. It's also easier to predict the size of the frame at encode time, which lets us treat metadata and payloads more uniformly in the protocol.
* Add a connection id field to our header, in preparation for allowing multiple data connections
* Allow payloads to be shipped on the control channel ('connection id 0') and use this for sending small messages
Closes#37765
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37765 from ctiller:tiefling 7b57f72367
PiperOrigin-RevId: 695766541
There was an edge case in which a socket or endpoint was shut down, a socket `read` call returned zero bytes, and there was unread in the read buffer from a previous read operation. The endpoint callbacks were called with an error status to indicate the end of the stream, and the callbacks did not consume that final chunk of data.
My current hunch is that something inside gRPC is violating the EventEngine Endpoint::Read contract, but I'm not certain what, yet. 88b5c9e3ab/include/grpc/event_engine/event_engine.h (L197-L199)
However, by modifying WindowsEndpoint to return an `absl::OkStatus()` if there's any data in the buffer, tests appear to pass.
Closes#38014
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38014 from drfloob:win-endpoint-data-leak b24b2d9f8a
PiperOrigin-RevId: 691063044
`//test/core/end2end/...` tests stopped running back in March 2024, when the linking process for test binaries changed. See https://github.com/grpc/grpc/pull/36197.
The test targets ran on Windows, but zero tests were found inside those targets, so the tests succeeded instantly. This fix results in longer linking steps, and more disk space consumed, but tests are getting discovered now. To illustrate the problem, run `bazel test //test/core/end2end:ping_pong_streaming_test --test_output=all --test_arg=--gtest_list_tests`. Before this PR, zero tests were found. Now:
```
CoreEnd2endTest.
PingPongStreaming1/Inproc
PingPongStreaming1/Chttp2FakeSecurityFullstack
PingPongStreaming1/Chttp2Fullstack
PingPongStreaming1/Chttp2FullstackCompression
PingPongStreaming1/Chttp2FullstackLocalIpv4
...
```
Closes#37918
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37918 from drfloob:fix-win-core-e2e e60f832a5d
PiperOrigin-RevId: 690741492
This will disable the jobs in both bazel and cmake builds, which is necessary for our CI to remain happy. These end2end tests were enabled on Windows at some point without any notice (they had been intentionally disabled for a while), but the RBE jobs have been silently failing for 7 months, so it's unclear when that happened.
These failures still need to be examined so we can re-enable these tests.
Closes#37983
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37983 from drfloob:winguh 64a62fd5b9
PiperOrigin-RevId: 689118231
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37973
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37973 from yijiem:core-end2end-windows-hack a2da4ae4eb
PiperOrigin-RevId: 688727248
I'm continuing to look into some flakes here, but in the meantime these shouldn't halt submissions. Marking them flaky.
Closes#37880
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37880 from ctiller:mark-flaky 27427c7978
PiperOrigin-RevId: 684526341
The following files have been moved:
- src/core/lib/avl/*
- src/core/lib/backoff/*
- src/core/lib/debug/event_log*
- src/core/lib/iomgr/gethostname*
- src/core/lib/iomgr/grpc_if_nametoindex*
- src/core/lib/matchers/*
- src/core/lib/uri/* (renamed from uri_parser.* to uri.*)
- src/core/lib/gprpp/* (existing src/core/util/time.cc was renamed to gpr_time.cc to avoid conflict)
Closes#36792
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36792 from markdroth:reorg_util d4e8996f48
PiperOrigin-RevId: 676947640
In some rare occasions on Win machines (0,2-0,5%), the tests are stuck before the handshake when we execute `grpc_call_start_batch`. We receive OP_COMPLETE with `Deadline Exceeded {grpc_status:4}` for such cases. The PR bumps it from 5 to 30s (no flakes for --runs_per_test=1000).
Closes#37767
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37767 from erm-g:h2test 9d1208ee1f
PiperOrigin-RevId: 676531975
The gRPC Core API currently requires callers to provide initial metadata before trailing metadata. You can see the C++ Callback API do this bookkeeping, for example. There is an eventual goal to be able to provide these in any order, and have gRPC do the right thing, but core is not there yet.
The proxy fixture in our end2end tests had a rare scenario in which trailing metadata from the server would show up at the proxy before initial metadata. This is part of the proxy's job: to split up batches into singular-operations that can complete in any order. There was, however, a rare flake wherein trailing metadata would complete before initial metadata, and the result was both client and server waiting on each other to respond.
This change adds a way for the proxy to defer sending trailing metadata back to the client, until after initial metadata has been sent to the client. In my testing, this eliminates the flake I had been able to reproduce 1 in 10k times using a single test. It happened more frequently across the full set of tests in our CI test suites.
Closes#37738
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37738 from drfloob:fix-proxy-fixture 6e0d7b7e6f
PiperOrigin-RevId: 676026493
Without this, we see GOAWAYs with "enter idle" irrespective of the reason being idleness or max connection age.
Closes#37709
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37709 from yashykt:ChannelIdleFilterMessage 236072e7e2
PiperOrigin-RevId: 675762380
Looks like there are some odd interactions, but call-v3 doesn't (and will never) handle wakeup sets, so disable for now until iomgr is removed.
Closes#37630
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37630 from ctiller:cgg 7c37893667
PiperOrigin-RevId: 671104484
Fixes a bug in the backoff implementation whereby we were incorrectly failing to apply jitter to the initial backoff.
Also change the API to return `Duration` instead of `Timestamp`. The only caller that actually wants to count the backoff from the start of the previous attempt instead of the end of the previous attempt is the subchannel code, and it handles that on its end.
Closes#37595
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37595 from markdroth:backoff_fixes_and_api_improvement 39d083c0f4
PiperOrigin-RevId: 669112557
[Gpr_To_Absl_Logging] Remove gpr logging header include from other headers
Some of the cc files are using gpr_log_verbosity_init() functions.
So I removed the #include <grpc/support/log.h> from all headers and put it selectively only in the cc files that used gpr_log_verbosity_init()
Closes#37513
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37513 from tanvi-jagtap:remove_gpr_headers_from_headers_01 612ca6d0f7
PiperOrigin-RevId: 663811895
[Gpr_To_Absl_Logging] Remove logging header from example and test/core/t folder
Closes#37491
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37491 from tanvi-jagtap:remove_header_test_core_02 e524a2da80
PiperOrigin-RevId: 663595237
Includes a few changes to pollset stuff to make it easier to not use pollsets (which I think is going to be generally helpful in the coming months)
Closes#37397
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37397 from ctiller:client-chicken 1099cd500a
PiperOrigin-RevId: 660014128
1. Fixing unit test that flags log noise.
2. This test was broken for many months. As a result , a lot of log noise was added. Removing the noise as a part of the PR.
3. If we want to retain any log line as `INFO` instead of `VLOG(2)`, please let me know, I will add it to allow list.
4. In this PR , we replace the old `gpr_set_log_function` mechanism with an `absl LogSink` . So here , `Send` function will do everything that `NoLog` used to do before.
Closes#37177
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37177 from tanvi-jagtap:fix_nologging_tests ad58e2fb79
PiperOrigin-RevId: 655209718
1. Function gpr_default_log has been deprecated. This function will be deleted in a few weeks.
1. This entire unit test is being re-written as a part of another PR. But that PR will take a while to merge. In the mean team I want to delete all instances of this function to prevent further backsliding.
https://github.com/grpc/grpc/pull/37177
PiperOrigin-RevId: 654998772
1. Function gpr_set_log_verbosity has been deprecated. This function will be deleted in a few weeks.
1. gRPC now internally uses absl logging. Earlier gRPC was using its own custom logging mechanism called gpr which had a whole set of functions beginning with gpr_.
1. This entire unit test is being re-written as a part of another PR. But that PR will take a while to merge. In the mean team I want to delete all instances of this function to prevent further backsliding.
https://github.com/grpc/grpc/pull/37177
PiperOrigin-RevId: 654639955
Instead of passing the transport byte counts back up through the filter
stack to be reported to the `CallTracer`, we now have the transport
pass the transport byte counts directly to the `CallTracer` itself.
This will eventually allow us to avoid unnecessarily storing these byte
counts in cases where no `CallTracer` actually cares about the data, which
will reduce per-call memory. (In the short term, it actually increases
memory usage, but we can separately do some work to avoid the memory
usage in the transport by removing the `grpc_transport_stream_stats`
struct from the legacy filter API.)
This is a prereq for supporting `CallTracer` in the new call v3 stack,
which does not include the transport byte counts as part of the
receieve-trailing-metadata hook, unlike the legacy filter stack.
This change is controlled by the `call_tracer_in_transport` experiment,
which is enabled by default.
As part of this experiment, we also fix a couple of related bugs:
- On the client side, the chttp2 transport was incorrectly adding
annotations to the parent `ClientCallTracer` instead of the
`CallAttemptTracer`.
- The OpenCensus `ServerCallTracer` was incorrectly swapping the values
of sent and received bytes.
PiperOrigin-RevId: 650728181
### Problem 1
Context :
gpr_log() internally calls gpr_log_message() .
gpr_log_message() may either call gpr_default_log() or a user provided logging function.
gpr_default_log() uses absl LOG. This is the default. Most users will log this way.
For the small percentage of users who have customized the logging function, gpr_log will log to custom this function.
Problem :
We have converted half the instances of gpr_log to absl LOG().
For users who use the defaults - there will be no issue.
For the users who use a customized logging function
1. All the absl LOGs will log to the absl log sink.
2. All the gpr_log statements will log via this user provided function.
This is in-consistent behaviour and will cause confusion and difficulty in debugging.
Solution:
All logs should go to the same sink.
So we decided to make gpr_set_log_function a no op in this release.
The function will be deleted in the next release.
https://github.com/grpc/proposal/pull/425
### Problem 2
Context :
gpr_should_log is used to avoid computing expensive stuff for logging if the log is not going to be visible.
Problem :
gpr_should_log was referencing the GRPC_VERBOSITY flag and values set by gpr_set_log_verbosity .
However, actual logging happens based on the absl settings.
This is incorrect. Because if the old settings are not honoured, they should not be checked and no decision in code should be made based on settings which are not going to get used.
Solution :
Given the above changes in Problem 1, since all custom logging is disabled, all logging from gRPC with honour the absl LOG settings. Hence we modified the gpr_should_log function to refer to absl settings.
### Problem 3
We still have the issue of php using a custom log sink. We will address this in a separate PR.
### Problem 4
Tests related to test/core/end2end/tests/no_logging.cc are broken . These will be fixed in another PR.
Closes#36961
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36961 from tanvi-jagtap:fix_gpr_should_log 70c3224af1
PiperOrigin-RevId: 645096418
In the client fuzzer, some valid fuzzing scenarios would close the transport (thus deleting the endpoint), while the fuzzer mechanics still attempted to read/write to that endpoint. There was an inherent ownership problem, where both the transport and the fuzzer logic expected to own the endpoint lifetime.
This PR ensures that the transport owns the endpoint, and the fuzzer logic owns an object that can write to some shared endpoint state. This shared object can outlive the endpoint.
Closes#36966
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36966 from drfloob:fuzzer/4908841560506368 a9ea2e795d
PiperOrigin-RevId: 645081665