Allow for multiple `--grpc_experiments`, `--grpc_trace` command line
arguments to be added, accumulate them, and provide them to gRPC as one
thing.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Add a new binary that runs all core end2end tests in fuzzing mode.
In this mode FuzzingEventEngine is substituted for the default event
engine. This means that time is simulated, as is IO. The FEE gets
control of callback delays also.
In our tests the `Step()` function becomes, instead of a single call to
`completion_queue_next`, a series of calls to that function and
`FuzzingEventEngine::Tick`, driving forward the event loop until
progress can be made.
PR guide:
---
**New binaries**
`core_end2end_test_fuzzer` - the new fuzzer itself
`seed_end2end_corpus` - a tool that produces an interesting seed corpus
**Config changes for safe fuzzing**
The implementation tries to use the config fuzzing work we've previously
deployed in api_fuzzer to fuzz across experiments. Since some
experiments are far too experimental to be safe in such fuzzing (and
this will always be the case):
- a new flag is added to experiments to opt-out of this fuzzing
- a new hook is added to the config system to allow variables to
re-write their inputs before setting them during the fuzz
**Event manager/IO changes**
Changes are made to the event engine shims so that tcp_server_posix can
run with a non-FD carrying EventEngine. These are in my mind a bit
clunky, but they work and they're in code that we expect to delete in
the medium term, so I think overall the approach is good.
**Changes to time**
A small tweak is made to fix a bug initializing time for fuzzers in
time.cc - we were previously failing to initialize
`g_process_epoch_cycles`
**Changes to `Crash`**
A version that prints to stdio is added so that we can reliably print a
crash from the fuzzer.
**Changes to CqVerifier**
Hooks are added to allow the top level loop to hook the verification
functions with a function that steps time between CQ polls.
**Changes to end2end fixtures**
State machinery moves from the fixture to the test infra, to keep the
customizations for fuzzing or not in one place. This means that fixtures
are now just client/server factories, which is overall nice.
It did necessitate moving some bespoke machinery into
h2_ssl_cert_test.cc - this file is beginning to be problematic in
borrowing parts but not all of the e2e test machinery. Some future PR
needs to solve this.
A cq arg is added to the Make functions since the cq is now owned by the
test and not the fixture.
**Changes to test registration**
`TEST_P` is replaced by `CORE_END2END_TEST` and our own test registry is
used as a first depot for test information.
The gtest version of these tests: queries that registry to manually
register tests with gtest. This ultimately changes the name of our tests
again (I think for the last time) - the new names are shorter and more
readable, so I don't count this as a regression.
The fuzzer version of these tests: constructs a database of fuzzable
tests that it can consult to look up a particular suite/test/config
combination specified by the fuzzer to fuzz against. This gives us a
single fuzzer that can test all 3k-ish fuzzing ready tests and cross
polinate configuration between them.
**Changes to test config**
The zero size registry stuff was causing some problems with the event
engine feature macros, so instead I've removed those and used GTEST_SKIP
in the problematic tests. I think that's the approach we move towards in
the future.
**Which tests are included**
Configs that are compatible - those that do not do fd manipulation
directly (these are incompatible with FuzzingEventEngine), and those
that do not join threads on their shutdown path (as these are
incompatible with our cq wait methodology). Each we can talk about in
the future - fd manipulation would be a significant expansion of
FuzzingEventEngine, and is probably not worth it, however many uses of
background threads now should probably evolve to be EventEngine::Run
calls in the future, and then would be trivially enabled in the fuzzers.
Some tests currently fail in the fuzzing environment, a
`SKIP_IF_FUZZING` macro is used for these few to disable them if in the
fuzzing environment. We'll burn these down in the future.
**Changes to fuzzing_event_engine**
Changes are made to time: an exponential sweep forward is used now -
this catches small time precision things early, but makes decade long
timers (we have them) able to be used right now. In the future we'll
just skip time forward to the next scheduled timer, but that approach
doesn't yet work due to legacy timer system interactions.
Changes to port assignment: we ensure that ports are legal numbers
before assigning them via `grpc_pick_port_or_die`.
A race condition between time checking and io is fixed.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This PR implements a work-stealing thread pool for use inside
EventEngine implementations. Because of historical risks here, I've
guarded the new implementation behind an experiment flag:
`GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the
original thread pool implementation.
Benchmarks look very promising:
```
bazel test \
--test_timeout=300 \
--config=opt -c opt \
--test_output=streamed \
--test_arg='--benchmark_format=csv' \
--test_arg='--benchmark_min_time=0.15' \
--test_arg='--benchmark_filter=_FanOut' \
--test_arg='--benchmark_repetitions=15' \
--test_arg='--benchmark_report_aggregates_only=true' \
test/cpp/microbenchmarks:bm_thread_pool
```
2023-05-04: `bm_thread_pool` benchmark results on my local machine (64
core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to
master:
![image](https://user-images.githubusercontent.com/295906/236315252-35ed237e-7626-486c-acfa-71a36f783d22.png)
2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE
environment (unsure of machine configuration, likely small), comparing
this PR to master.
![image](https://user-images.githubusercontent.com/295906/236317164-2c5acbeb-fdac-4737-9b2d-4df9c41cb825.png)
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Reverts grpc/grpc#32924. This breaks the build again, unfortunately.
From `test/core/event_engine/cf:cf_engine_test`:
```
error: module .../grpc/test/core/event_engine/cf:cf_engine_test does not depend on a module exporting 'grpc/support/port_platform.h'
```
@sampajano I recommend looking into CI tests to catch iOS problems
before merging. We can enable EventEngine experiments in the CI
generally once this PR lands, but this broken test is not one of those
experiments. A normal build should have caught this.
cc @HannahShiSFB
Makes some awkward fixes to compression filter, call, connected channel
to hold the semantics we have upheld now in tests.
Once the fixes described here
https://github.com/grpc/grpc/blob/master/src/core/lib/channel/connected_channel.cc#L636
are in this gets a lot less ad-hoc, but that's likely going to be
post-landing promises client & server side.
We specifically need special handling for server side cancellation in
response to reads wrt the inproc transport - which doesn't track
cancellation thoroughly enough itself.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Audit logging APIs for both built-in loggers and third-party logger
implementations.
C++ uses using decls referring to C-Core APIs.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
Third-party loggers will be added in subsequent PRs once the logger
factory APIs are available to validate the configs here.
This registry is used in `xds_http_rbac_filter.cc` to generate service
config json.
The PR also creates a separate BUILD target for:
- chttp2 context list
- iomgr buffer_list
- iomgr internal errqueue
This would allow the context list to be included as standalone
dependencies for EventEngine implementations.
As Protobuf is going to support Cord to reduce memory copy when
[de]serializing Cord fields, gRPC is going to leverage it. This
implementation is based on the internal one but it's slightly modified
to use the public APIs of Cord. only
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
Notes:
- `+trace` fixtures haven't run since 2016, so they're disabled for now
(7ad2d0b463 (diff-780fce7267c34170c1d0ea15cc9f65a7f4b79fefe955d185c44e8b3251cf9e38R76))
- all current fixtures define `FEATURE_MASK_SUPPORTS_AUTHORITY_HEADER`
and hence `authority_not_supported` has not been run in years - deleted
- bad_hostname similarly hasn't been triggered in a long while, so
deleted
- load_reporting_hook has never been enabled, so deleted
(f23fb4cf31/test/core/end2end/generate_tests.bzl (L145-L148))
- filter_latency & filter_status_code rely on global variables and so
don't convert particularly cleanly - and their value seems marginal, so
deleted
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Built atop #31448
Offers a simple framework for testing filters.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Earlier, we were simply using a 64 bit random number, but the spec
actually calls for UUIDv4.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This allows us to replace `absl::optional<TaskHandle>` with checks
against the invalid handle.
This PR also replaces the differently-named invalid handle instances
with a uniform way of accessing static invalid instances across all
handle types, which aids a bit in testing.
Add an event manager that spawns threads just as much as it possibly
can... to expose TSAN to the myriad thread ordering problems in our code
base.
Next steps for this will be to add a new test mode for tsan + thready
event engine + a few other doodads to increase threads in the system
(party.cc in particular has a good place for a hook).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* [http] Dont drop connections on metadata limit exceeded
* remove bad test
* Automated change: Fix sanity tests
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* WRR: port StaticStrideScheduler to OSS
* WIP
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
* add OOB reporting
* generate_projects
* clang-format
* add config parser test
* clang-tidy and minimize lock contention
* add config defaults
* add oob_reporting_period config field and add basic test
* Automated change: Fix sanity tests
* fix test
* change test to use basic RR
* WIP: started exposing peer address to LB policy API
* first WRR test passing!
* small cleanup
* port RR fix to WRR
* test helper refactoring
* more test helper refactoring
* WIP: trying to fix test to have the right weights
* more WIP -- need to make pickers DualRefCounted
* fix timer ref handling and get tests working
* clang-format
* iwyu and generate_projects
* fix build
* add test for OOB reporting
* keep only READY subchannels in the picker
* add file missed in a previous commit
* fix sanity
* iwyu
* add weight expiration period
* add tests for weight update period and OOB reporting period
* Automated change: Fix sanity tests
* lower bound for timer interval
* consistently apply grpc_test_slowdown_factor()
* cache time in test
* add blackout_period tests
* avoid some unnecessary copies
* clang-format
* add field to config test
* simplify orca watcher tracking
* attempt to fix build
* iwyu
* generate_projects
* update xds proto dependency
* add xDS LB policy entry to registry
* add "_experimental" suffix to policy name
* update LB policy name and remove debug log
* add env var protection
* generate_projects
* gen_upb_api
* WRR: update tests to cover qps plumbing
* WIP
* Automated change: Fix sanity tests
* more WIP
* basic WRR e2e test working
* add OOB test
* add xDS WRR e2e test
* clang-format
* fix sanity
* ignore duplicate addresses
* Automated change: Fix sanity tests
* add new tracer to doc/environment_variables.md
* retain scheduler state across pickers
* Automated change: Fix sanity tests
* use separate mutexes for scheduler and timer
* sort addresses to avoid index churn
* remove fetch_sub for wrap around in RR case
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* WRR: port StaticStrideScheduler to OSS
* WIP
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
* add OOB reporting
* generate_projects
* clang-format
* add config parser test
* clang-tidy and minimize lock contention
* add config defaults
* add oob_reporting_period config field and add basic test
* Automated change: Fix sanity tests
* fix test
* change test to use basic RR
* WIP: started exposing peer address to LB policy API
* first WRR test passing!
* small cleanup
* port RR fix to WRR
* test helper refactoring
* more test helper refactoring
* WIP: trying to fix test to have the right weights
* more WIP -- need to make pickers DualRefCounted
* fix timer ref handling and get tests working
* clang-format
* iwyu and generate_projects
* fix build
* add test for OOB reporting
* keep only READY subchannels in the picker
* add file missed in a previous commit
* fix sanity
* iwyu
* add weight expiration period
* add tests for weight update period and OOB reporting period
* Automated change: Fix sanity tests
* lower bound for timer interval
* consistently apply grpc_test_slowdown_factor()
* cache time in test
* add blackout_period tests
* avoid some unnecessary copies
* clang-format
* add field to config test
* simplify orca watcher tracking
* attempt to fix build
* iwyu
* generate_projects
* add "_experimental" suffix to policy name
* WRR: update tests to cover qps plumbing
* WIP
* more WIP
* basic WRR e2e test working
* add OOB test
* fix sanity
* ignore duplicate addresses
* Automated change: Fix sanity tests
* add new tracer to doc/environment_variables.md
* retain scheduler state across pickers
* Automated change: Fix sanity tests
* use separate mutexes for scheduler and timer
* sort addresses to avoid index churn
* remove fetch_sub for wrap around in RR case
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* WRR: port StaticStrideScheduler to OSS
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
Co-authored-by: markdroth <markdroth@users.noreply.github.com>