WireWriter implementation schedules actions to be run by `ExecCtx`. We
should flush pending actions before destructing
`end2end_testing::g_transaction_processor`, which need to be alive to
handle the scheduled actions. Otherwise,
we get heap-use-after-free error because the testing fixture
(`end2end_testing::g_transaction_processor`) is destructed before all
the scheduled actions are run.
This lowers end2end binder transport test failure rate from 0.23% to
0.15%, according to internal tool that runs the test for 15000 times
under various configuration.
This PR implements a work-stealing thread pool for use inside
EventEngine implementations. Because of historical risks here, I've
guarded the new implementation behind an experiment flag:
`GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the
original thread pool implementation.
Benchmarks look very promising:
```
bazel test \
--test_timeout=300 \
--config=opt -c opt \
--test_output=streamed \
--test_arg='--benchmark_format=csv' \
--test_arg='--benchmark_min_time=0.15' \
--test_arg='--benchmark_filter=_FanOut' \
--test_arg='--benchmark_repetitions=15' \
--test_arg='--benchmark_report_aggregates_only=true' \
test/cpp/microbenchmarks:bm_thread_pool
```
2023-05-04: `bm_thread_pool` benchmark results on my local machine (64
core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to
master:
![image](https://user-images.githubusercontent.com/295906/236315252-35ed237e-7626-486c-acfa-71a36f783d22.png)
2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE
environment (unsure of machine configuration, likely small), comparing
this PR to master.
![image](https://user-images.githubusercontent.com/295906/236317164-2c5acbeb-fdac-4737-9b2d-4df9c41cb825.png)
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
One TXT lookup query can return multiple TXT records (see the following
example). `EventEngine::DNSResolver` should return all of them to let
the caller (e.g. `event_engine_client_channel_resolver`) decide which
one they would use.
```
$ dig TXT wikipedia.org
; <<>> DiG 9.18.12-1+build1-Debian <<>> TXT wikipedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49626
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;wikipedia.org. IN TXT
;; ANSWER SECTION:
wikipedia.org. 600 IN TXT "google-site-verification=AMHkgs-4ViEvIJf5znZle-BSE2EPNFqM1nDJGRyn2qk"
wikipedia.org. 600 IN TXT "yandex-verification: 35c08d23099dc863"
wikipedia.org. 600 IN TXT "v=spf1 include:wikimedia.org ~all"
```
Note that this change also deviates us from the iomgr's DNSResolver API
which uses std::string as the result type.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Reverts grpc/grpc#32924. This breaks the build again, unfortunately.
From `test/core/event_engine/cf:cf_engine_test`:
```
error: module .../grpc/test/core/event_engine/cf:cf_engine_test does not depend on a module exporting 'grpc/support/port_platform.h'
```
@sampajano I recommend looking into CI tests to catch iOS problems
before merging. We can enable EventEngine experiments in the CI
generally once this PR lands, but this broken test is not one of those
experiments. A normal build should have caught this.
cc @HannahShiSFB
Reverts grpc/grpc#33002. Breaks internal builds:
`.../privacy_context:filters does not depend on a module exporting
'.../src/core/lib/channel/context.h'`
Change call attributes to be stored in a `ChunkedVector` instead of
`std::map<>`, so that the storage can be allocated on the arena. This
means that we're now doing a linear search instead of a map lookup, but
the total number of attributes is expected to be low enough that that
should be okay.
Also, we now hide the actual data structure inside of the
`ServiceConfigCallData` object, which required some changes to the
`ConfigSelector` API. Previously, the `ConfigSelector` would return a
`CallConfig` struct, and the client channel would then use the data in
that struct to populate the `ServiceConfigCallData`. This PR changes
that such that the client channel creates the `ServiceConfigCallData`
before invoking the `ConfigSelector`, and it passes the
`ServiceConfigCallData` into the `ConfigSelector` so that the
`ConfigSelector` can populate it directly.
The protection is added at `xds_http_rbac_filter.cc` where we read the
new field. With this disabling the feature, nothing from things like
`xds_audit_logger_registry.cc` shall be invoked.
Makes some awkward fixes to compression filter, call, connected channel
to hold the semantics we have upheld now in tests.
Once the fixes described here
https://github.com/grpc/grpc/blob/master/src/core/lib/channel/connected_channel.cc#L636
are in this gets a lot less ad-hoc, but that's likely going to be
post-landing promises client & server side.
We specifically need special handling for server side cancellation in
response to reads wrt the inproc transport - which doesn't track
cancellation thoroughly enough itself.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
1. `GrpcAuthorizationEngine` creates the logger from the given config in
its ctor.
2. `Evaluate()` invokes audit logging when needed.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
Whilst the per cpu counters probably help single channel contention, we
think it's likely that they're a pessimization when taken fleetwide.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Add audit condition and audit logger config into `grpc_core::Rbac`.
Support translation of audit logging options from authz policy to it.
Audit logging options in authz policy looks like:
```json
{
"audit_logging_options": {
"audit_condition": "ON_DENY",
"audit_loggers": [
{
"name": "logger",
"config": {},
"is_optional": false
}
]
}
}
```
which is consistent with what's in the xDS RBAC proto but a little
flattened.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
Audit logging APIs for both built-in loggers and third-party logger
implementations.
C++ uses using decls referring to C-Core APIs.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Third-party loggers will be added in subsequent PRs once the logger
factory APIs are available to validate the configs here.
This registry is used in `xds_http_rbac_filter.cc` to generate service
config json.
The PR also creates a separate BUILD target for:
- chttp2 context list
- iomgr buffer_list
- iomgr internal errqueue
This would allow the context list to be included as standalone
dependencies for EventEngine implementations.
In order to help https://github.com/grpc/grpc/pull/32748, change the
test so that it tells us what the problem is in the logs.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This test proves that `global_stats.IncrementHttp2MetadataSize(0)` works
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Added `fuzzer_input.proto` and `NetworkInput` proto message
- Migrated client_fuzzer and server_fuzzer to proto fuzzer
- Migrated the existing corpus and verified that the code coverage (e.g.
chttp2) stays the same
Probably need to cherrypick due to amount of files changed.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
The very non-trivial upgrade of third_party/protobuf to 22.x
This PR strives to be as small as possible and many changes that were
compatible with protobuf 21.x and didn't have to be merged atomically
with the upgrade were already merged.
Due to the complexity of the upgrade, this PR wasn't created
automatically by a tool, but manually. Subsequent upgraded of
third_party/protobuf with our OSS release script should work again once
this change is merged.
This is best reviewed commit-by-commit, I tried to group changes in
logical areas.
Notable changes:
- the upgrade of third_party/protobuf submodule, the bazel protobuf
dependency itself
- upgrade of UPB dependency to 22.x (in the past, we used to always
upgrade upb to "main", but upb now has release branch as well). UPB
needs to be upgraded atomically with protobuf since there's a de-facto
circular dependency (new protobuf depends on new upb, which depends on
new protobuf for codegen).
- some protobuf and upb bazel rules are now aliases, so `
extract_metadata_from_bazel_xml.py` and `gen_upb_api_from_bazel_xml.py`
had to be modified to be able to follow aliases and reach the actual
aliased targets.
- some protobuf public headers were renamed, so especially
`src/compiler` needed to be updated to use the new headers.
- protobuf and upb now both depend on utf8_range project, so since we
bundle upb with grpc in some languages, we now have to bundle utf8_range
as well (hence changes in build for python, PHP, objC, cmake etc).
- protoc now depends on absl and utf8_range (previously protobuf had
absl dependency, but not for the codegen part), so python's
make_grpcio_tools.py required partial rewrite to be able to handle those
dependencies in the grpcio_tools build.
- many updates and fixes required for C++ distribtests (currently they
all pass, but we'll probably need to follow up, make protobuf's and
grpc's handling of dependencies more aligned and revisit the
distribtests)
- bunch of other changes mostly due to overhaul of protobuf's and upb's
internal build layout.
TODOs:
- [DONE] make sure IWYU and clang_tidy_code pass
- create a list of followups (e.g. work to reenable the few tests I had
to disable and to remove workaround I had to use)
- [DONE in cl/523706129] figure out problem(s) with internal import
---------
Co-authored-by: Craig Tiller <ctiller@google.com>
This enables the EventEngine experiments in end2end tests, excluding the
ResourceQuota tests which have known failures.
Some Windows tests are hanging, so they will be enabled later.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
If an engine is created, it should be fully functional regardless of
whether gRPC-core experiments are on or off. The trade-off for now is
that when the core experiments are not enabled, the engine will be
slowly polling with nothing to do.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Expand server promises to run with C++ end2end tests.
Across connected_channel/call/batch_builder/pipe/transport:
- fix a bug where read errors weren't propagated from transport to call
so that we can populate failed_before_recv_message for the c++ bindings
- ensure those errors are not, however, used to populate the returned
call status
Add a new latch call arg to lazily propagate the bound CQ for a server
call (and client call, but here it's used degenerately - it's always
populated). This allows server calls to be properly bound to
pollsets.(1)/(2)
In call.cc:
- move some profiling code from FilterStackCall to Call, and then use it
in PromiseBasedCall (this should be cleaned up with tracing work)
- implement GetServerAuthority
In server.cc:
- use an RAII pattern on `MatchResult` to avoid a bug whereby a tag
could be dropped if we cancel a request after it's been matched but
before it's published
- fix deadline export to ServerContext
In resource_quota_server.cc:
- fix some long standing flakes (that were finally obvious with the new
test code) - it's legal here to have client calls not arrive at the
server due to resource starvation, work through that (includes adding
expectations during a `Step` call, which required some small tweaks to
cq_verifier)
In the C++ end2end_test.cc:
- strengthen a flaky test so it passes consistently (it's likely we'll
revisit this with the fuzzing efforts to strengthen it into an actually
robust test)
(1) It's time to remove this concept
(2) Surprisingly the only test that *reliably* demonstrates this not
being done is time_change_test
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This error can trigger for either initial or trailing metadata (and
we've had outages where the latter was the cause).
I don't think we know at this layer if we're parsing initial or trailing
- though it'd be a good exercise to plumb that through.
For now remove the word initial because it's better to give less
information than wrong information.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This reverts commit 4b46dbc19e.
Reason: this seems to be breaking load reports in certain cases,
b/276944116
Let's revert so this doesn't accidentally get released.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Notes:
- `+trace` fixtures haven't run since 2016, so they're disabled for now
(7ad2d0b463 (diff-780fce7267c34170c1d0ea15cc9f65a7f4b79fefe955d185c44e8b3251cf9e38R76))
- all current fixtures define `FEATURE_MASK_SUPPORTS_AUTHORITY_HEADER`
and hence `authority_not_supported` has not been run in years - deleted
- bad_hostname similarly hasn't been triggered in a long while, so
deleted
- load_reporting_hook has never been enabled, so deleted
(f23fb4cf31/test/core/end2end/generate_tests.bzl (L145-L148))
- filter_latency & filter_status_code rely on global variables and so
don't convert particularly cleanly - and their value seems marginal, so
deleted
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>