Original PR was #34307, reverted in #34318 due to internal test
failures.
The first commit is a revert of the revert. The second commit contains
the fix.
The original idea here was that `SubchannelWrapper::Orphan()`, which is
called when the strong refcount reaches 0, would take a new weak ref and
then hop into the `WorkSerializer` before dropping that weak ref, thus
ensuring that the `SubchannelWrapper` is destroyed inside the
`WorkSerializer` (which is needed because the `SubchannelWrapper` dtor
cleans up some state in the channel related to the subchannel). The
problem is that `DualRefCounted<>::Unref()` itself actually increments
the weak ref count before calling `Orphan()` and then decrements it
afterwards. So in the case where the `SubchannelWrapper` is unreffed
outside of the `WorkSerializer` and no other thread happens to be
holding the `WorkSerializer`, the weak ref that we were taking in
`Orphan()` was unreffed inline, which meant that it wasn't actually the
last weak ref -- the last weak ref was the one taken by
`DualRefCounted<>::Unref()`, and it wasn't released until after the
`WorkSerializer` was released.
To this this problem, we move the code from the `SubchannelWrapper` dtor
that cleans up the channel's state into the `WorkSerializer` callback
that is scheduled in `Orphan()`. Thus, regardless of whether or not the
last weak ref is released inside of the `WorkSerializer`, we are
definitely doing that cleanup inside the `WorkSerializer`, which is what
we actually care about.
Also adds an experiment to guard this behavior.
Most recent attempt was #34320, reverted in #34335.
The first commit here is a pure revert. The second commit fixes the
outlier_detection unit test to pass both with and without the
experiment.
In certain situations the current flow control algorithm can result in
sending one flow control update write for every write sent (known
situation: rollout of promise based server calls with qps_test).
Fix things up so that the updates are only sent when truly needed, and
then fix the fallout (turns out our fuzzer had some bugs)
I've placed actual logic changes behind an experiment so that it can be
incrementally & safely rolled out.
We added this as an exploratory measure for a customer that thought they
were using open census (this turned out to be emphatically false).
Remove it since it's probably not how we ultimately want to do this, and
wait for something better to come along.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This turns on the `work_stealing` experiment everywhere, enabling the
work-stealing thread pool in Posix & Windows EventEngine
implementations. This only really has an effect when EventEngine is used
for I/O (currently flag-guarded).
Need the ability to override server-side keepalive permit without calls
default without affecting client-side settings.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Over the past 5 days, this experiment has not introduced any new flakes,
nor increased any flake rates. Let's enable it for debug builds. To
prevent issues over the weekend, I plan to merge it next week, July 31st
(with announcement).
The intuition here is that these strings may end up in the hpack table,
and then unnecessarily extend the lifetime of the read blocks.
Instead, take a copy of these short strings when we need to and allow
the incoming large memory object to be discarded.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
It introduces the following syntax:
The following would mark the experiment as broken on ios, false on
windows and debug on posix. If a platform is un-specified, the default
for that platform will be set to false. Refer to
test/core/experiments/fixtures/test_experiments_rollout.yaml for
examples which are tested.
- name: experiment_1
default:
ios: broken
windows: false
posix: debug
It also supports the already existing syntax and interprets it as just
specifying one default for all platforms.
Supported platform tags: ios, windows, posix
Along with an experiment this time
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We defaulted this on 5 months ago, and it seems to be working... let's
remove the experiment bit!
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR implements a work-stealing thread pool for use inside
EventEngine implementations. Because of historical risks here, I've
guarded the new implementation behind an experiment flag:
`GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the
original thread pool implementation.
Benchmarks look very promising:
```
bazel test \
--test_timeout=300 \
--config=opt -c opt \
--test_output=streamed \
--test_arg='--benchmark_format=csv' \
--test_arg='--benchmark_min_time=0.15' \
--test_arg='--benchmark_filter=_FanOut' \
--test_arg='--benchmark_repetitions=15' \
--test_arg='--benchmark_report_aggregates_only=true' \
test/cpp/microbenchmarks:bm_thread_pool
```
2023-05-04: `bm_thread_pool` benchmark results on my local machine (64
core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to
master:

2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE
environment (unsure of machine configuration, likely small), comparing
this PR to master.

---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
This PR also centralizes the client channel resolver selection. Resolver
selection is still done using the plugin system, but when the Ares and
native client channel resolvers go away, we can consider bootstrapping
this differently.
* see if experiments can lose weight
* test
* test
* test
* test
* contra test
* contra test
* add explainer
* Automated change: Fix sanity tests
* fixes
* fix
* strict-bs
* comments
* fixes
* iwyu
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>