More work on the dualstack backend design:
- Change round_robin to delegate to pick_first instead of creating
subchannels directly.
- Change pick_first such that when it is the child of a petiole policy,
it will unconditionally start a health watch.
- Change the client-side health checking code such that if client-side
health checking is not enabled, it will return the subchannel's raw
connectivity state.
- As part of this, we introduce a new endpoint_list library to be used
by petiole policies, which is intended to replace the existing
subchannel_list library. The only policy that will still directly
interact with subchannels is pick_first, so the relevant parts of the
subchannel_list functionality have been copied directly into that
policy. The subchannel_list library will be removed after all petiole
policies are updated to delegate to pick_first.
The address attribute interface was intended to provide a mechanism to
pass attributes separately from channel args, for values that do not
affect subchannel behavior and therefore do not need to be present in
the subchannel key, which does include channel args. However, the
mechanism as currently designed is fairly clunky and is probably not the
direction we will want to go in the long term.
Eventually, we will want some mechanism for registering channel args,
which would provide a cleaner way to indicate that a given channel arg
should not be used in the subchannel key, so that we don't need a
completely different mechanism. For now, this PR is just doing an
interim step, which is to establish a special channel arg key prefix to
indicate that an arg is not needed in the subchannel key.
With some delay, this is a PR for
https://github.com/grpc/grpc/issues/32564 (and previously
https://github.com/grpc/grpc/pull/31791).
I looked into adding a regular `py_test` for this change [as
suggested](https://github.com/grpc/grpc/pull/31791#issuecomment-1423245116)
but I am not aware of any effect that the presence of a .pyi stub file
would have at runtime and where some sort of type-checking in a .py
script would be affected. Stub files are only for use by type checkers &
IDE's. I mean, something like this would work:
```
import helloworld_pb2
py_file = helloworld_pb2.__file__
pyi_file = py_file + 'i’
self.assertTrue(os.path.exists(pyi_file))
```
But that seems really hacky to me. Instead I created a simple rule test
for `py_proto_library` with Bazel Skylib which tests the declared
outputs for an example `py_proto_library` target. Indirectly, this also
tests that the declared output files are actually generated. Please let
me know if this is sufficient.
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
Revert "Revert "[core] Add support for vsock transport"
(https://github.com/grpc/grpc/pull/33276)"
This reverts commit
c5ade3011a.
And fix the issue which broke the python build.
@markdroth@drfloob please review this PR. Thank you very much.
---------
Co-authored-by: AJ Heller <hork@google.com>
This is a hack to get around an issue on Apple devices caused by the
PosixEventEngine's `poll` poller not supporting fork. This PR disables
the EventEngine poller entirely in Python builds. It will therefore
prevent the release of the EventEngine generally, and prevent any
testing of EventEngine integration with gRPC Python, until the `poll`
poller is fixed.
This is another attempt to add support for vsock in grpc since previous
PRs(#24551, #21745) all closed without merging.
The VSOCK address family facilitates communication between
virtual machines and the host they are running on.
This patch will introduce new scheme: [vsock:cid:port] to
support VSOCK address family.
Fixes#32738.
---------
Signed-off-by: Yadong Qi <yadong.qi@intel.com>
Co-authored-by: AJ Heller <hork@google.com>
Co-authored-by: YadongQi <YadongQi@users.noreply.github.com>
Most of these data structures need to scale a bit like per-cpu, but not
entirely. We can have more than one cpu hit the same instance in most
cases, and probably want to cap out before the hundreds of shards some
platforms have.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This test mode tries to create threads wherever it legally can to
maximize the chances of TSAN finding errors in our codebase.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Reverts grpc/grpc#33088
The internal test is failing because after the change, transitive
dependencies will be included in response too, thus we need add
`empty2_pb2` to response since it's a dependency of
`empty2_extensions_pb2`.
We already did that in OSS test, we need do the same for internal test
case.
The logger uses `absl::FPrintF` to write to stdout. After reading a
number of sources online, I got the impression that `std::fwrite` which
is used by `absl::FPrintF` is atomic so there is no locking required
here.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
With these, it is actually possible to have typed client stubs where the
return type is correctly inferred.
It's only for the non-streaming calls, because there is
`RequestIterableType` for the streaming ones (but it's just Any with
extra steps and would require much more work).
---------
Co-authored-by: Xuan Wang <xuanwn@google.com>
Reverts grpc/grpc#32909
It's breaking some internal test
(//net/grpc/python:internal_tests/unit/_default_reflection_test), revert
for now to investigate.
This PR implements a work-stealing thread pool for use inside
EventEngine implementations. Because of historical risks here, I've
guarded the new implementation behind an experiment flag:
`GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the
original thread pool implementation.
Benchmarks look very promising:
```
bazel test \
--test_timeout=300 \
--config=opt -c opt \
--test_output=streamed \
--test_arg='--benchmark_format=csv' \
--test_arg='--benchmark_min_time=0.15' \
--test_arg='--benchmark_filter=_FanOut' \
--test_arg='--benchmark_repetitions=15' \
--test_arg='--benchmark_report_aggregates_only=true' \
test/cpp/microbenchmarks:bm_thread_pool
```
2023-05-04: `bm_thread_pool` benchmark results on my local machine (64
core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to
master:

2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE
environment (unsure of machine configuration, likely small), comparing
this PR to master.

---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Reverts grpc/grpc#32924. This breaks the build again, unfortunately.
From `test/core/event_engine/cf:cf_engine_test`:
```
error: module .../grpc/test/core/event_engine/cf:cf_engine_test does not depend on a module exporting 'grpc/support/port_platform.h'
```
@sampajano I recommend looking into CI tests to catch iOS problems
before merging. We can enable EventEngine experiments in the CI
generally once this PR lands, but this broken test is not one of those
experiments. A normal build should have caught this.
cc @HannahShiSFB
1. `GrpcAuthorizationEngine` creates the logger from the given config in
its ctor.
2. `Evaluate()` invokes audit logging when needed.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
Fix: https://github.com/grpc/grpc/issues/18075
From comments in https://github.com/grpc/grpc/issues/18075, `CPython`
reinitialize the `GIL` after `pthread_atfork` child handler, thus we
shouldn't use any `GIL` related functions in child handler which is what
we're currently doing, this PR uses `os.register_at_fork` to replace
`pthread_atfork` to prevent any undesired bevahior.
This also seems to fixes a thread hanging issue cased by changes in
core: https://github.com/grpc/grpc/pull/32869
### Testing:
* Passed existing fork tests. (Note that due to some issues in `Bazel`,
this change was not verified by `Bazel runs_per_test`).
* Tested by patch the core PR, was able to fix Python fork tests:
https://github.com/grpc/grpc/pull/32933
Third-party loggers will be added in subsequent PRs once the logger
factory APIs are available to validate the configs here.
This registry is used in `xds_http_rbac_filter.cc` to generate service
config json.
This paves the way for making pick_first the universal leaf policy (see
#32692), which will be needed for the dualstack design. That change will
require changing pick_first to see both the raw connectivity state and
the health-checking connectivity state of a subchannel, so that we can
enable health checking when pick_first is used underneath round_robin
without actually changing the pick_first connectivity logic (currently,
pick_first always disables health checking). To make it possible to do
that, this PR moves the health checking code out of the subchannel and
into a separate API using the same data-watcher mechanism that was added
for ORCA OOB calls.
The PR also creates a separate BUILD target for:
- chttp2 context list
- iomgr buffer_list
- iomgr internal errqueue
This would allow the context list to be included as standalone
dependencies for EventEngine implementations.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
The very non-trivial upgrade of third_party/protobuf to 22.x
This PR strives to be as small as possible and many changes that were
compatible with protobuf 21.x and didn't have to be merged atomically
with the upgrade were already merged.
Due to the complexity of the upgrade, this PR wasn't created
automatically by a tool, but manually. Subsequent upgraded of
third_party/protobuf with our OSS release script should work again once
this change is merged.
This is best reviewed commit-by-commit, I tried to group changes in
logical areas.
Notable changes:
- the upgrade of third_party/protobuf submodule, the bazel protobuf
dependency itself
- upgrade of UPB dependency to 22.x (in the past, we used to always
upgrade upb to "main", but upb now has release branch as well). UPB
needs to be upgraded atomically with protobuf since there's a de-facto
circular dependency (new protobuf depends on new upb, which depends on
new protobuf for codegen).
- some protobuf and upb bazel rules are now aliases, so `
extract_metadata_from_bazel_xml.py` and `gen_upb_api_from_bazel_xml.py`
had to be modified to be able to follow aliases and reach the actual
aliased targets.
- some protobuf public headers were renamed, so especially
`src/compiler` needed to be updated to use the new headers.
- protobuf and upb now both depend on utf8_range project, so since we
bundle upb with grpc in some languages, we now have to bundle utf8_range
as well (hence changes in build for python, PHP, objC, cmake etc).
- protoc now depends on absl and utf8_range (previously protobuf had
absl dependency, but not for the codegen part), so python's
make_grpcio_tools.py required partial rewrite to be able to handle those
dependencies in the grpcio_tools build.
- many updates and fixes required for C++ distribtests (currently they
all pass, but we'll probably need to follow up, make protobuf's and
grpc's handling of dependencies more aligned and revisit the
distribtests)
- bunch of other changes mostly due to overhaul of protobuf's and upb's
internal build layout.
TODOs:
- [DONE] make sure IWYU and clang_tidy_code pass
- create a list of followups (e.g. work to reenable the few tests I had
to disable and to remove workaround I had to use)
- [DONE in cl/523706129] figure out problem(s) with internal import
---------
Co-authored-by: Craig Tiller <ctiller@google.com>
Various fixes that can be merged in advance, to simplify the 22.x
upgrade.
- Use NOMINMAX for grpcio_tools build to avoid build failure when absl
is added as a dependency. Fixes
https://github.com/grpc/grpc/issues/32779
- in spawn_patch command file, put arguments on multiple lines to avoid
lines going over the limit for large link commands: Fixes
https://github.com/grpc/grpc/issues/32795
- Use `python/generator.h` instead of the deprecated
`python/python_generator.h` in grpcio_tools.
### Description
When invoking a RPC, sometimes operations in core will fail which result
in an `ExecuteBatchError`, currently we [simply raise this
error](https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi#L705),
and left this Exception unhandled in event loop with error message `Task
exception was never retrieved`.
### Changes included
This PR changes the behavior to log the `ExecuteBatchError` instead of
raising it.
This shouldn't change existing behavior because even without this
change, the exception will simply left on event loop.
### Testing
Tested by manually `set_expection` in [this
future](https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi#L84),
verified that error was logged correctly and no exception was left in
event loop, sample log:
```
ERROR:grpc._cython.cygrpc:ExecuteBatchError raised in core by servicer method [/grpc.testing.TestService/UnaryCall]
Traceback (most recent call last):
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 682, in _cython.cygrpc._handle_exceptions
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 802, in _handle_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 547, in _handle_unary_unary_rpc
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 452, in _finish_handler_with_unary_response
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 106, in execute_wrong_batch
_cython.cygrpc.ExecuteBatchError: Failed "execute_batch_including_SendInitialMetadataOperation": (<_cython.cygrpc.SendInitialMetadataOperation object at 0x7fa86d936ca0>, <_cython.cygrpc.SendMessageOperation object at 0x7fa86cf48cb0>, <_cython.cygrpc.SendStatusFromServerOperation object at 0x7fa86cf4c110>)
```
Fix python type hints for async iteration on `UnaryStreamCall` and
`StreamStreamCall`. Those classes _are_ `AsyncIterable`s and `__aiter__`
returns an `AsyncIterator`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR also centralizes the client channel resolver selection. Resolver
selection is still done using the plugin system, but when the Ares and
native client channel resolvers go away, we can consider bootstrapping
this differently.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
(hopefully last try)
Add new channel arg GRPC_ARG_ABSOLUTE_MAX_METADATA_SIZE as hard limit
for metadata. Change GRPC_ARG_MAX_METADATA_SIZE to be a soft limit.
Behavior is as follows:
Hard limit
(1) if hard limit is explicitly set, this will be used.
(2) if hard limit is not explicitly set, maximum of default and soft
limit * 1.25 (if soft limit is set) will be used.
Soft limit
(1) if soft limit is explicitly set, this will be used.
(2) if soft limit is not explicitly set, maximum of default and hard
limit * 0.8 (if hard limit is set) will be used.
Requests between soft and hard limit will be rejected randomly, requests
above hard limit will be rejected.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>