Will be used during this transition time to run 5-pipe style filters somewhat more natively. Once everything is getting closer to 5-pipes, we'll drop this method and have the channel stack understand how to create an interception-map that can be reused per-call, instead of creating the interception-map every time a call is created.
Closes#35200
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35200 from ctiller:cg-channel-filter-api 2fc11dd273
PiperOrigin-RevId: 587940947
This hack temporarily quiets the flaky test report for a known race.
This is the only end2end test that shuts down & restarts a server in the same test execution. The PosixEventEngine's Listener implementation asynchronously shuts down listening ports after Listener destruction. Some changes can possibly be made here to only proceed in server restart after the `on_shutdown` callback is called, ensuring all ports are closed before proceeding.
Closes#35149
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35149 from drfloob:hack/max_concurrent_fix_for_posix_ee_listener 9a7b7b53dd
PiperOrigin-RevId: 586471281
Also break the filter stack and promise based versions apart so that I
can re-understand this code.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Earlier, the grpc message-length prefix for outgoing data messages was
incorrectly being counted towards `data_bytes` instead of
`framing_bytes`. This PR fixes it.
Note that the incoming stats collection properly attributes the grpc
message-length prefix to `framing_bytes`.
This change will affect all stats plugins (OpenCensus and OpenTelemetry)
that make use of this information for metrics.
Let's not merge this PR this week, maybe 2023-10-23 at the earliest.
This will allow time for flakes to shake out of the listener experiments
(enabled in https://github.com/grpc/grpc/pull/34700) in isolation of
client problems.
Ditch the old priority scheme for ordering filters, instead explicitly
mark up before/after constraints.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Just seeing data flowing in after a ping is enough to establish liveness
of a connection, and so we can limit keepalive timeouts to that. Ping
timeouts are necessary for protocol correctness, but may be stuck behind
other traffic, so give them a little more of a grace period.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Mark D. Roth <roth@google.com>
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix sticky-TF behavior such that once we enter TRANSIENT_FAILURE, we do
not leave that state if we get a new address list.
Also, fix handling of subchannels in state TRANSIENT_FAILURE.
Previously, if a subchannel was already in state TRANSIENT_FAILURE when
we wanted to start a connection attempt on it (e.g., because the
subchannel already existed from a different channel, or because it
already existed in the previous subchannel list), we would wait for it
to report IDLE before attempting to connect. This PR changes pick_first
to instead immediately skip the subchannel and move on to the next one.
Now, the only time we wait for a subchannel in TRANSIENT_FAILURE is when
we wrap back around to the first subchannel in the list.
CNR a WindowsEventEngine listener flake in:
* 10k local Windows development machine runs
* 50k Windows RBE runs
* 10k Windows VM runs
It fails ~5 times per day on the master CI jobs.
This PR adds some logging to try to see if an edge is missed, and
switches the thread pool implementation to see if that makes the flake
go away. If the flakes disappear, I'll try removing one or the other to
see if either independently fix the problem (hopefully not logging).
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Why: Cleanup for chttp2_transport ahead of promise conversion - lots of
logic has become interleaved throughout chttp2, so some effort to
isolate logic out is warranted ahead of that conversion.
What: Split configuration and policy tracking for each of ping rate
throttling and abuse detection into their own modules. Add tests for
them.
Incidentally: Split channel args into their own header so that we can
split the policy stuff into separate build targets.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>