This has been stable for a bit, everywhere that the EventEngine is
enabled. Going forward, I think the event_engine_{client|listener}
experiments can probably be used to regulate thread-pool-specific
issues.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Proposed alternative to https://github.com/grpc/grpc/pull/34024.
This version has a simpler, faster busy-count implementation based on a
sharded set of atomic counts: fast increment/decrement operations,
relatively slower summation of total counts (which need to happen much
less frequently).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Implement DNS using dns service for iOS.
Current limitation:
1. Using a custom name server is not supported.
2. Only supports `LookupHostname`. `LookupSRV` and `LookupTXT` are not
implemented.
3. Not tested with single stack (ipv4 or ipv6) environment
4. ~Not tested with multiple ip records per stack~ manually tested with
wsj.com
5. Not tested with multiple interface environment
Reverts grpc/grpc#33819
Verified that it passed these jobs:
`grpc/core/master/linux/grpc_basictests_c_cpp_dbg`
`grpc/core/master/linux/grpc_basictests_c_cpp_opt`
`grpc/core/master/linux/grpc_portability`
Why: Cleanup for chttp2_transport ahead of promise conversion - lots of
logic has become interleaved throughout chttp2, so some effort to
isolate logic out is warranted ahead of that conversion.
What: Split configuration and policy tracking for each of ping rate
throttling and abuse detection into their own modules. Add tests for
them.
Incidentally: Split channel args into their own header so that we can
split the policy stuff into separate build targets.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Those tests are failing on CIs which do not have twisted installed. Skip
them for now and will fix the docker images next.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR implements a c-ares based DNS resolver for EventEngine with the
reference from the original
[grpc_ares_wrapper.h](../blob/master/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h).
The PosixEventEngine DNSResolver is implemented on top of that. Tests
which use the client channel resolver API
([resolver.h](../blob/master/src/core/lib/resolver/resolver.h#L54)) are
ported, namely the
[resolver_component_test.cc](../blob/master/test/cpp/naming/resolver_component_test.cc)
and the
[cancel_ares_query_test.cc](../blob/master/test/cpp/naming/cancel_ares_query_test.cc).
The WindowsEventEngine DNSResolver will use the same EventEngine's
grpc_ares_wrapper and will be worked on next.
The
[resolve_address_test.cc](https://github.com/grpc/grpc/blob/master/test/core/iomgr/resolve_address_test.cc)
which uses the iomgr
[DNSResolver](../blob/master/src/core/lib/iomgr/resolve_address.h#L44)
API has been ported to EventEngine's dns_test.cc. That leaves only 2
tests which use iomgr's API, notably the
[dns_resolver_cooldown_test.cc](../blob/master/test/core/client_channel/resolvers/dns_resolver_cooldown_test.cc)
and the
[goaway_server_test.cc](../blob/master/test/core/end2end/goaway_server_test.cc)
which probably need to be restructured to use EventEngine DNSResolver
(for one thing they override the original grpc_ares_wrapper's free
functions). I will try to tackle these in the next step.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Based on the discussion at:
595a75cc5d..e3b402a8fa (r1244325752)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This change simplifies `EventEngine::DNSResolver`'s API based on the
proposal:
[go/event-engine-dns-resolver-api-changes](http://go/event-engine-dns-resolver-api-changes).
Note that this API change + the implementation described in
[go/event-engine-dns-resolver-implementation](http://go/event-engine-dns-resolver-implementation)
has already been tested against our main test suites and are passing
them.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We want writes to participate in event re-ordering, but it's unlikely
that we can sustain one byte per 500ms on all tests and keep them
passing (which is the degenerate case right now).
Tune write delays down to 50ms for the moment, though I expect we'll
want to talk about going lower.
Fix fuzzer found bug b/286716972
Follows up on https://github.com/grpc/grpc/pull/33266 but gets the edge
case right of when there's a read queued before the peer closes - in
that case we weren't waking up the read.
Revert "Revert "[core] Add support for vsock transport"
(https://github.com/grpc/grpc/pull/33276)"
This reverts commit
c5ade3011a.
And fix the issue which broke the python build.
@markdroth@drfloob please review this PR. Thank you very much.
---------
Co-authored-by: AJ Heller <hork@google.com>
Reverts grpc/grpc#33223. Breaks internal build.
```
work_stealing_thread_pool.cc:180:27: error: format specifies type 'double' but the argument has type 'gpr_cycle_counter' (aka 'long') [-Werror,-Wformat]
```
This is another attempt to add support for vsock in grpc since previous
PRs(#24551, #21745) all closed without merging.
The VSOCK address family facilitates communication between
virtual machines and the host they are running on.
This patch will introduce new scheme: [vsock:cid:port] to
support VSOCK address family.
Fixes#32738.
---------
Signed-off-by: Yadong Qi <yadong.qi@intel.com>
Co-authored-by: AJ Heller <hork@google.com>
Co-authored-by: YadongQi <YadongQi@users.noreply.github.com>
This test mode tries to create threads wherever it legally can to
maximize the chances of TSAN finding errors in our codebase.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Add a new binary that runs all core end2end tests in fuzzing mode.
In this mode FuzzingEventEngine is substituted for the default event
engine. This means that time is simulated, as is IO. The FEE gets
control of callback delays also.
In our tests the `Step()` function becomes, instead of a single call to
`completion_queue_next`, a series of calls to that function and
`FuzzingEventEngine::Tick`, driving forward the event loop until
progress can be made.
PR guide:
---
**New binaries**
`core_end2end_test_fuzzer` - the new fuzzer itself
`seed_end2end_corpus` - a tool that produces an interesting seed corpus
**Config changes for safe fuzzing**
The implementation tries to use the config fuzzing work we've previously
deployed in api_fuzzer to fuzz across experiments. Since some
experiments are far too experimental to be safe in such fuzzing (and
this will always be the case):
- a new flag is added to experiments to opt-out of this fuzzing
- a new hook is added to the config system to allow variables to
re-write their inputs before setting them during the fuzz
**Event manager/IO changes**
Changes are made to the event engine shims so that tcp_server_posix can
run with a non-FD carrying EventEngine. These are in my mind a bit
clunky, but they work and they're in code that we expect to delete in
the medium term, so I think overall the approach is good.
**Changes to time**
A small tweak is made to fix a bug initializing time for fuzzers in
time.cc - we were previously failing to initialize
`g_process_epoch_cycles`
**Changes to `Crash`**
A version that prints to stdio is added so that we can reliably print a
crash from the fuzzer.
**Changes to CqVerifier**
Hooks are added to allow the top level loop to hook the verification
functions with a function that steps time between CQ polls.
**Changes to end2end fixtures**
State machinery moves from the fixture to the test infra, to keep the
customizations for fuzzing or not in one place. This means that fixtures
are now just client/server factories, which is overall nice.
It did necessitate moving some bespoke machinery into
h2_ssl_cert_test.cc - this file is beginning to be problematic in
borrowing parts but not all of the e2e test machinery. Some future PR
needs to solve this.
A cq arg is added to the Make functions since the cq is now owned by the
test and not the fixture.
**Changes to test registration**
`TEST_P` is replaced by `CORE_END2END_TEST` and our own test registry is
used as a first depot for test information.
The gtest version of these tests: queries that registry to manually
register tests with gtest. This ultimately changes the name of our tests
again (I think for the last time) - the new names are shorter and more
readable, so I don't count this as a regression.
The fuzzer version of these tests: constructs a database of fuzzable
tests that it can consult to look up a particular suite/test/config
combination specified by the fuzzer to fuzz against. This gives us a
single fuzzer that can test all 3k-ish fuzzing ready tests and cross
polinate configuration between them.
**Changes to test config**
The zero size registry stuff was causing some problems with the event
engine feature macros, so instead I've removed those and used GTEST_SKIP
in the problematic tests. I think that's the approach we move towards in
the future.
**Which tests are included**
Configs that are compatible - those that do not do fd manipulation
directly (these are incompatible with FuzzingEventEngine), and those
that do not join threads on their shutdown path (as these are
incompatible with our cq wait methodology). Each we can talk about in
the future - fd manipulation would be a significant expansion of
FuzzingEventEngine, and is probably not worth it, however many uses of
background threads now should probably evolve to be EventEngine::Run
calls in the future, and then would be trivially enabled in the fuzzers.
Some tests currently fail in the fuzzing environment, a
`SKIP_IF_FUZZING` macro is used for these few to disable them if in the
fuzzing environment. We'll burn these down in the future.
**Changes to fuzzing_event_engine**
Changes are made to time: an exponential sweep forward is used now -
this catches small time precision things early, but makes decade long
timers (we have them) able to be used right now. In the future we'll
just skip time forward to the next scheduled timer, but that approach
doesn't yet work due to legacy timer system interactions.
Changes to port assignment: we ensure that ports are legal numbers
before assigning them via `grpc_pick_port_or_die`.
A race condition between time checking and io is fixed.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR implements a work-stealing thread pool for use inside
EventEngine implementations. Because of historical risks here, I've
guarded the new implementation behind an experiment flag:
`GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the
original thread pool implementation.
Benchmarks look very promising:
```
bazel test \
--test_timeout=300 \
--config=opt -c opt \
--test_output=streamed \
--test_arg='--benchmark_format=csv' \
--test_arg='--benchmark_min_time=0.15' \
--test_arg='--benchmark_filter=_FanOut' \
--test_arg='--benchmark_repetitions=15' \
--test_arg='--benchmark_report_aggregates_only=true' \
test/cpp/microbenchmarks:bm_thread_pool
```
2023-05-04: `bm_thread_pool` benchmark results on my local machine (64
core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to
master:

2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE
environment (unsure of machine configuration, likely small), comparing
this PR to master.

---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
One TXT lookup query can return multiple TXT records (see the following
example). `EventEngine::DNSResolver` should return all of them to let
the caller (e.g. `event_engine_client_channel_resolver`) decide which
one they would use.
```
$ dig TXT wikipedia.org
; <<>> DiG 9.18.12-1+build1-Debian <<>> TXT wikipedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49626
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;wikipedia.org. IN TXT
;; ANSWER SECTION:
wikipedia.org. 600 IN TXT "google-site-verification=AMHkgs-4ViEvIJf5znZle-BSE2EPNFqM1nDJGRyn2qk"
wikipedia.org. 600 IN TXT "yandex-verification: 35c08d23099dc863"
wikipedia.org. 600 IN TXT "v=spf1 include:wikimedia.org ~all"
```
Note that this change also deviates us from the iomgr's DNSResolver API
which uses std::string as the result type.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Reverts grpc/grpc#32924. This breaks the build again, unfortunately.
From `test/core/event_engine/cf:cf_engine_test`:
```
error: module .../grpc/test/core/event_engine/cf:cf_engine_test does not depend on a module exporting 'grpc/support/port_platform.h'
```
@sampajano I recommend looking into CI tests to catch iOS problems
before merging. We can enable EventEngine experiments in the CI
generally once this PR lands, but this broken test is not one of those
experiments. A normal build should have caught this.
cc @HannahShiSFB
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
If an engine is created, it should be fully functional regardless of
whether gRPC-core experiments are on or off. The trade-off for now is
that when the core experiments are not enabled, the engine will be
slowly polling with nothing to do.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Built atop #31448
Offers a simple framework for testing filters.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Implement listeners, connection, endpoints for `FuzzingEventEngine`.
Allows the fuzzer to select write sizes and delays, connection delays,
and port assignments.
I made a few modifications to the test suite to admit this event engine
to pass the client & server tests:
1. the test factories return shared_ptr<> to admit us to return the same
event engine for both the oracle and the implementation - necessary
because FuzzingEventEngine forms a closed world of addresses & ports.
2. removed the WaitForSingleOwner calls - these seem unnecessary, and we
don't ask our users to do this - tested existing linux tests 1000x
across debug, asan, tsan with this change
Additionally, the event engine overrides the global port picker logic so
that port assignments are made by the fuzzer too.
This PR is a step along a longer journey, and has some outstanding
brethren PR's, and some follow-up work:
* #32603 will convert all the core e2e tests into a more malleable form
* we'll then use #32667 to turn all of these into fuzzers
* finally we'll integrate this into that work and turn all core e2e
tests into fuzzers over timer & callback reorderings and io
size/spacings
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This allows us to replace `absl::optional<TaskHandle>` with checks
against the invalid handle.
This PR also replaces the differently-named invalid handle instances
with a uniform way of accessing static invalid instances across all
handle types, which aids a bit in testing.