Builds upon #37765 to support arbitrary connection counts in the transport.
(note: at this point the number of connections is determined at connection establishment - future work will be autotuning this)
Closes#38032
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38032 from ctiller:tiefling-buffer c7520fd7a9
PiperOrigin-RevId: 698952890
The bug occurs in the following fairly specific sequence of events:
1. PF gets a resolver update with two or more addresses. It starts connecting to the first address and starts a Happy Eyeballs timer for 250ms.
- Note that the timer holds a ref to the `SubchannelList`, which is necessary to trigger the bug below. If there was only one address, there would be no Happy Eyeballs timer holding a ref here, so the bug would not occur.
2. The first subchannel reports CONNECTING and is seen by the LB policy.
3. The first subchannel reports READY, and the notification hops into the WorkSerializer but has not yet been executed.
4. The timer fires, and the timer callback hops into the WorkSerializer but has not yet been executed.
5. The LB policy gets shut down. This shuts down the `SubchannelList`, but we fail to actually shut down the underlying `SubchannelState`.
- This is the bug! We *should* be shutting down the `SubchannelState` here.
- Note that if the pending timer callback were not holding a ref to the `SubchannelList`, then the bug would not occur: the `SubchannelList` would have been immediately destroyed, which *would* have shut down the `SubchannelState`. In particular, note that if the timer had not yet fired, shutting down the `SubchannelList` would cancel the timer, thus releasing the ref immediately and shutting down the `SubchannelState`. Similarly, if the timer callback had already been seen by the LB policy, then the ref would also no longer be held.
6. The LB policy now sees the READY notification. This should be a no-op, since PF has already been shut down. However, because the `SubchannelState` was not shut down, it selects the subchannel instead.
7. The LB policy now sees the timer fire. This becomes a no-op, but it releases the ref to the `SubchannelList`, thus causing the `SubchannelList` to be destroyed. However, the `SubchannelState` for the selected subchannel from the previous step is no longer owned by the `SubchannelList`, so it is not shut down.
8. The selected subchannel now reports IDLE. This causes PF to call `GoIdle()`, and at this point we are holding the last ref to the LB policy, which we try to access after giving up that ref, thus causing a crash.
- Note that we're not actually holding this ref in order to keep the LB policy alive at this point; the ref actually exists only due to some [tech debt](14e077f9bd/src/core/load_balancing/pick_first/pick_first.cc (L196)). We should never be executing this code path to begin with after PF has been shut down, so we shouldn't need that ref.
Closes#38144
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38144 from markdroth:pick_first_new_fix 4ec9f9ea1d
PiperOrigin-RevId: 698807898
Increase timeout for Python basic tests on Windows
The tests are timing out on kokoro windows
We have added support for 3.13 python version but have not dropped support for any older version. hence overall number of tests have increased.
Closes#38162
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38162 from sourabhsinghs:bugfix/kokoro-windows-basictests-python 999fd351ed
PiperOrigin-RevId: 698551254
This pulls in a patch that increases the max iteration limit, which is useful for extra-small microbenchmarks.
Closes#38163
PiperOrigin-RevId: 698524219
This PR addresses the frequent timeouts encountered by the Windows portability build-only test suite. Currently, the suite includes two tests: one using MSBuild and another using Ninja, both with MSVC 2022. To mitigate the timeout issue, this PR disables the MSBuild test while retaining the Ninja test.
Closes#38159
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38159 from veblush:win-port 7507298b9a
PiperOrigin-RevId: 698412694
This is to unblock https://github.com/grpc/grpc/pull/38038 but gRPC needs to use one of released versions so later it should be updated once they release new one.
Closes#38140
PiperOrigin-RevId: 697743397
By itself this is a no-op, but a future change will leverage this to allow fuzzers to inject thread hops into party activations (a technique that has helped find multiple log lived bugs in the past 24 hours)
Closes#38139
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38139 from ctiller:flake-fightas-30 e19a1af694
PiperOrigin-RevId: 697620027
VLOG is probably the wrong thing here (considering it's been requested explicitly via a trace)
Closes#38135
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38135 from ctiller:flake-fightas-26 52a78995d2
PiperOrigin-RevId: 697067177
These corpora entries helped isolate a number of bugs in cancel_after_invoke.
By themselves right now I don't expect them to do much, but I want to seed our upstream fuzzers with this data so that we can find new examples in the future.
Closes#38132
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38132 from ctiller:flake-fightas-23 9f6ae727f8
PiperOrigin-RevId: 697064247
If we close reads on an mpsc then readers should also fail - not doing so can open the way for some weird stuck bugs
Closes#38138
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38138 from ctiller:flake-fightas-29 8bc61601be
PiperOrigin-RevId: 697023583
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38128
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38128 from yashykt:TestMetricstestFlakiness 4ac65b2d80
PiperOrigin-RevId: 697000317
Fix https://github.com/grpc/grpc/issues/37969.
There is an inverted length check in GrpcPolledFdWindows before memcpying from gRPC's `recv_from_source_addr_` into c-ares' socket address structure. In newer c-ares version, it changed to use `struct sockaddr_storage` for the socket address which is 128 bytes and hit this issue.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#38101
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38101 from yijiem:37969 282fc8269e
PiperOrigin-RevId: 696607100
Just used this to find out we always do a tcp write for client initial metadata prior to payload
Closes#38053
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38053 from ctiller:party-see 6b5a2ba6cf
PiperOrigin-RevId: 696371772
This has been timing out recently.
Looks like in a lot of passing runs of this job, we're taking b/t ~55 minutes
and 1.5 hours
PiperOrigin-RevId: 696262722
Update the chaotic-good wire format with some learnings from the past year, and set up things for the next round of changes we'd like to make:
* Instead of a composite FRAGMENT frame, split out CLIENT_INITIAL_METADATA, CLIENT_END_OF_STREAM, MESSAGE, SERVER_INITIAL_METADATA, SERVER_TRAILING_METADATA as separate frame types - this eliminates a ton of complexity in the transport, and corresponds to how we used the wire format in practice anyway.
* Switch the frame payload for metadata, settings to be protobuf instead of HPACK - this eliminates the ordering requirements on interpreting these frames between streams, which I expect to open up some flexibility with head of line avoidance in the future. It's a heck of a lot easier to read and reason about the code. It's also easier to predict the size of the frame at encode time, which lets us treat metadata and payloads more uniformly in the protocol.
* Add a connection id field to our header, in preparation for allowing multiple data connections
* Allow payloads to be shipped on the control channel ('connection id 0') and use this for sending small messages
Closes#37765
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37765 from ctiller:tiefling 7b57f72367
PiperOrigin-RevId: 695766541
- Adding two experiments for promises based HTTP2 transport.
- We have kept client and server transport experiments separate to help with smoother roll outs and also help with interop testing.
- The experiments are disabled, we expect this project to take several months.
Closes#38103
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/38103 from tanvi-jagtap:client_server_transport_experiment 53a24bda04
PiperOrigin-RevId: 695606023