Disabling before the 1.62 branch cut.
There is a flake in one specific interop test: cloud_to_prod_auth:c++:*:oauth2_auth_token:tls, with * being default or gateway_v4. There is also an unresolved PHP crash on test shutdown, just debug builds on Mac (not Linux or Windows).
Closes#35819
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35819 from drfloob:disable-posix-ee-client c42e8b8c20
PiperOrigin-RevId: 604474559
The client promise code seems to cause a problem with iomgr pollset shutdown which is causing flakiness.
Right now I don't think it's likely that we'll get this code rolled out before the event engine client lands, so I'm making the experiment dependent on event engine polling.
Closes#35678
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35678 from ctiller:flake 4512fa81b0
PiperOrigin-RevId: 601926443
This reverts commit 6318e9e7e9.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35667
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35667 from ctiller:a 032999b51e
PiperOrigin-RevId: 601495207
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35573
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35573 from yijiem:enable-oss-ee-dns-posix-real 017b99312f
PiperOrigin-RevId: 601245249
- `memory_pressure_controller` finally - allows deletion of pid_controller throughout the codebase
- `overload_protection` - one of the http2 rapid reset mitigations
- `red_max_concurrent_streams` - another http2 rapid reset mitigation
Closes#35426
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35426 from ctiller:new-years-cleanse 4651672e7e
PiperOrigin-RevId: 595205029
Add a config to experiments & rollouts to allow dependent experiments to
be flagged.
We're getting past the point where it's possible to reason about which
experiments need to be turned off if we disable some other experiment,
and so this provides some additional rollout safety.
Can be specified in both experiments and rollouts: experiments.yaml
makes the most sense and we should default to it, but rollouts.yaml lets
us put dependencies between internal & external dependencies internally
and that's gonna be a little useful.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Cancel streams if we have too much concurrency on a single channel to
allow that server to recover.
There seems to be a convergence in the HTTP2 community about this being
a reasonable thing to do, so I'd like to try it in some real scenarios.
If this pans out well then I'll likely drop the
`red_max_concurrent_streams` and the `rstpit` experiments in preference
to this.
I'm also considering tying in resource quota so that under high memory
pressure we just default to this path.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Noticed that our behavior conflicts with what's mandated:
https://www.rfc-editor.org/rfc/rfc9113.html#section-5.1.2
So here's a quick fix - I like it because it's not a disconnection,
which always feels like one outage avoided.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Instead of fixing a target size for writes, try to adapt it a little to
observed bandwidth.
The initial algorithm tries to get large writes within 100-1000ms
maximum delay - this range probably wants to be tuned, but let's see.
The hope here is that on slow connections we can not back buffer so much
and so when we need to send a ping-ack it's possible without great
delay.
Experiment 1: On RST_STREAM: reduce MAX_CONCURRENT_STREAMS for one round
trip.
Experiment 2: If a settings frame is outstanding with a lower
MAX_CONCURRENT_STREAMS than is configured, and we receive a new incoming
stream that would exceed the new cap, randomly reject it.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Cap requests per read, rst_stream handled per read.
If these caps are exceeded, offload processing of the connection to a
backing thread pool, and allow other connections to make progress.
Previously chttp2 would allow infinite requests prior to a settings ack
- as the agreed upon limit for requests in that state is infinite.
Instead, after MAX_CONCURRENT_STREAMS requests have been attempted,
start blanket cancelling requests until the settings ack is received.
This can be done efficiently without allocating request state
structures.
Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We disabled this a little while ago for lack of CI bandwidth, but #34404
ought to have freed up enough capacity that we can keep running this.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Summary -
On the server-side, we are changing the point at which we decide whether
a method is registered or not from the surface to the transport at the
point where we are done receiving initial metadata and before we invoke
the recv_initial_metadata_ready closures from the filters. The main
motivation for this is to allow filters to check whether the incoming
method is a registered or not. The exact use-case is for observability
where we only want to record the method if it is registered. We store
the information about the registered method in the initial metadata.
On the client-side, we also set information about whether the method is
registered or not in the outgoing initial metadata.
Since we are effectively changing the lookup point of the registered
method, there are slight concerns of this being a potentially breaking
change, so we are guarding this with an experiment to be safe.
Changes -
* Transport API changes -
* Along with `accept_stream_fn`, a new callback
`registered_method_matcher_cb` will be sent down as a transport op on
the server side. When initial metadata is received on the server side,
this callback is invoked. This happens before invoking the
`recv_initial_metadata_ready` closure.
* Metadata changes -
* We add a new non-serializable metadata trait `GrpcRegisteredMethod()`.
On the client-side, the value is a uintptr_t with a value of 1 if the
call has a registered/known method, or 0, if it's not known. On the
server side, the value is a (ChannelRegisteredMethod*). This metadata
information can be used throughout the stack to check whether a call is
registered or not.
* Server Changes -
* When a new transport connection is accepted, the server sets
`registered_method_matcher_cb` along with `accept_stream_fn`. This
function checks whether the method is registered or not and sets the
RegisteredMethod matcher in the metadata for use later.
* Client Changes -
* Set the metadata on call creation on whether the method is registered
or not.
Original PR was #34307, reverted in #34318 due to internal test
failures.
The first commit is a revert of the revert. The second commit contains
the fix.
The original idea here was that `SubchannelWrapper::Orphan()`, which is
called when the strong refcount reaches 0, would take a new weak ref and
then hop into the `WorkSerializer` before dropping that weak ref, thus
ensuring that the `SubchannelWrapper` is destroyed inside the
`WorkSerializer` (which is needed because the `SubchannelWrapper` dtor
cleans up some state in the channel related to the subchannel). The
problem is that `DualRefCounted<>::Unref()` itself actually increments
the weak ref count before calling `Orphan()` and then decrements it
afterwards. So in the case where the `SubchannelWrapper` is unreffed
outside of the `WorkSerializer` and no other thread happens to be
holding the `WorkSerializer`, the weak ref that we were taking in
`Orphan()` was unreffed inline, which meant that it wasn't actually the
last weak ref -- the last weak ref was the one taken by
`DualRefCounted<>::Unref()`, and it wasn't released until after the
`WorkSerializer` was released.
To this this problem, we move the code from the `SubchannelWrapper` dtor
that cleans up the channel's state into the `WorkSerializer` callback
that is scheduled in `Orphan()`. Thus, regardless of whether or not the
last weak ref is released inside of the `WorkSerializer`, we are
definitely doing that cleanup inside the `WorkSerializer`, which is what
we actually care about.
Also adds an experiment to guard this behavior.
Most recent attempt was #34320, reverted in #34335.
The first commit here is a pure revert. The second commit fixes the
outlier_detection unit test to pass both with and without the
experiment.