More changes as part of the dualstack design:
- Change resolver and LB policy APIs to support multiple addresses per
endpoint. Specifically, replace `ServerAddress` with
`EndpointAddresses`, which encodes more than one address. Per-address
channel args are retained at the same level, so they are now
per-endpoint. For now, `EndpointAddress` provides a single-address ctor
and a single-address accessor for backward compatibility, so
`ServerAdress` is an alias for `EndpointAddresses`; eventually, this
alias and the single-address methods will be removed.
- Add an `EndpointAddressSet` class, which represents an unordered set
of addresses to be used as a map key. This will be used in a number of
LB policies that need to store per-endpoint state.
- Change the LB policy API's `ChannelControlHelper::CreateSubchannel()`
method to take the address and per-endpoint channel args as separate
parameters, so that we don't need to construct a legacy `ServerAddress`
object as we create a new subchannel for each address in the endpoint.
- Change pick_first to flatten the address list.
- Change ring_hash to use `EndpointAddressSet` as the key for its
endpoint map, and to use the first address of the endpoint as the hash
key.
- Change WRR to use `EndpointAddressSet` as the key for its endpoint
weight map.
Note that support for multiple addresses per endpoint is guarded in RR
by the existing `round_robin_delegate_to_pick_fist` experiment and in
WRR by the existing `wrr_delegate_to_pick_first` experiment.
This PR does *not* include support for multiple addresses per endpoint
for the outlier_detection or xds_override_host LB policies; those will
come in subsequent PRs.
If we get a readable event on an fd and both the following happens:
- c-ares does *not* read all bytes off the fd
- c-ares removes the fd from the set ARES_GETSOCK_READABLE
... then we have a busy loop here, where we'd keep asking c-ares to
process an fd that it no longer cares about.
This is indirectly related to a change in this code one month ago:
https://github.com/grpc/grpc/pull/33942 - before that change, c-ares
would close the socket when it called
[handle_error](7f3262312f/src/lib/ares_process.c (L707))
and so `IsFdStillReadableLocked` would start returning `false`, causing
us to get away with [this
loop](f6a994229e/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc (L371)).
Now, because `IsFdStillReadableLocked` will keep returning true (because
of our overridden `close` API), we'll loop forever.
This makes the resolver component tests suite run on Window RBE by
adding a flag in the test driver to further differentiate between Bazel
local run and Bazel RBE run on Windows since they have different
RUNFILES behavior.
Local Bazel run succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel2 test --dynamic_mode=off --verbose_failures --test_arg=--running_locally=true //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 196.080s, Critical Path: 193.21s
INFO: 2 processes: 1 internal, 1 local.
INFO: Build completed successfully, 2 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 193.1s
Executed 1 out of 1 test: 1 test passes.
```
RBE run succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --bazelrc=tools/remote_build/windows.bazelrc test --config=windows_opt --dynamic_mode=off --verbose_failures --host_linkopt=/NODEFAULTLIB:libcmt.lib --host_linkopt=/DEFAULTLIB:msvcrt.lib --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Invocation ID: d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
WARNING: --remote_upload_local_results is set, but the remote cache does not support uploading action results or the current account is not authorized to write local results to the remote cache.
INFO: Streaming build results to: https://source.cloud.google.com/results/invocations/d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (0 packages loaded, 133 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 41.627s, Critical Path: 39.42s
INFO: 2 processes: 1 internal, 1 remote.
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 33.0s
Executed 1 out of 1 test: 1 test passes.
INFO: Streaming build results to: https://source.cloud.google.com/results/invocations/d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
INFO: Build completed successfully, 2 total actions
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Local Bazel invocation succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel2 test --dynamic_mode=off --verbose_failures //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker@poller=epoll1.exe
INFO: Elapsed time: 199.262s, Critical Path: 193.48s
INFO: 2 processes: 1 internal, 1 local.
INFO: Build completed successfully, 2 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 PASSED in 193.4s
Executed 1 out of 1 test: 1 test passes.
```
The local invocation of RBE failed with linker error `LINK : error
LNK2001: unresolved external symbol mainCRTStartup`, but that does not
limited to this target:
https://gist.github.com/yijiem/2c6cbd9a31209a6de8fd711afbf2b479.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Port https://github.com/grpc/grpc/pull/33871 to EE's
GrpcPolledFdFactoryPosix.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The previous version (`3.12`) is 7 years old and does not support the
newest Python 3 versions. This causes issues to move certain test
targets (which depends on `pyyaml`) to Python 3 when some CI environment
(e.g. `arm64v8/debian:11`) does not have Python 2 installed. And in
general, we should move away from Python 2. Thus, updated `pyyaml` to
the latest version.
This hopefully should also fix the
`prod:grpc/core/master/linux/arm64/grpc_bazel_test_c_cpp` job breakage.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Rolls forward https://github.com/grpc/grpc/pull/33871
Second and third commits here fix internal build issues
In particular, add a `// IWYU pragma: no_include <ares_build.h>` since
`ares.h` [includes that
anyways](bad62225b7/include/ares.h (L23))
(and seems unlikely for that to change since it would be breaking)
Normally, c-ares related fds are destroyed after all DNS resolution is
finished in [this code
path](c82d31677a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc (L210)).
Also there are some fds that c-ares may fail to open or write to
initially, and c-ares will close them internally before grpc ever knows
about them.
But if:
1) c-ares opens a socket and successfully writes a request on it
2) then a subsequent read fails
Then c-ares will close the fd in [this code
path](bad62225b7/src/lib/ares_process.c (L740)),
but gRPC will have a reference on the fd and will still use it
afterwards.
Fix here is to leverage the c-ares socket-override API to properly track
fd ownership between c-ares and grpc.
Related: internal issue b/292203138
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR implements a c-ares based DNS resolver for EventEngine with the
reference from the original
[grpc_ares_wrapper.h](../blob/master/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h).
The PosixEventEngine DNSResolver is implemented on top of that. Tests
which use the client channel resolver API
([resolver.h](../blob/master/src/core/lib/resolver/resolver.h#L54)) are
ported, namely the
[resolver_component_test.cc](../blob/master/test/cpp/naming/resolver_component_test.cc)
and the
[cancel_ares_query_test.cc](../blob/master/test/cpp/naming/cancel_ares_query_test.cc).
The WindowsEventEngine DNSResolver will use the same EventEngine's
grpc_ares_wrapper and will be worked on next.
The
[resolve_address_test.cc](https://github.com/grpc/grpc/blob/master/test/core/iomgr/resolve_address_test.cc)
which uses the iomgr
[DNSResolver](../blob/master/src/core/lib/iomgr/resolve_address.h#L44)
API has been ported to EventEngine's dns_test.cc. That leaves only 2
tests which use iomgr's API, notably the
[dns_resolver_cooldown_test.cc](../blob/master/test/core/client_channel/resolvers/dns_resolver_cooldown_test.cc)
and the
[goaway_server_test.cc](../blob/master/test/core/end2end/goaway_server_test.cc)
which probably need to be restructured to use EventEngine DNSResolver
(for one thing they override the original grpc_ares_wrapper's free
functions). I will try to tackle these in the next step.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
This PR also centralizes the client channel resolver selection. Resolver
selection is still done using the plugin system, but when the Ares and
native client channel resolvers go away, we can consider bootstrapping
this differently.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Implement listeners, connection, endpoints for `FuzzingEventEngine`.
Allows the fuzzer to select write sizes and delays, connection delays,
and port assignments.
I made a few modifications to the test suite to admit this event engine
to pass the client & server tests:
1. the test factories return shared_ptr<> to admit us to return the same
event engine for both the oracle and the implementation - necessary
because FuzzingEventEngine forms a closed world of addresses & ports.
2. removed the WaitForSingleOwner calls - these seem unnecessary, and we
don't ask our users to do this - tested existing linux tests 1000x
across debug, asan, tsan with this change
Additionally, the event engine overrides the global port picker logic so
that port assignments are made by the fuzzer too.
This PR is a step along a longer journey, and has some outstanding
brethren PR's, and some follow-up work:
* #32603 will convert all the core e2e tests into a more malleable form
* we'll then use #32667 to turn all of these into fuzzers
* finally we'll integrate this into that work and turn all core e2e
tests into fuzzers over timer & callback reorderings and io
size/spacings
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* Declarative JSON parser
* Automated change: Fix sanity tests
* fix
* shrinking stuff a little
* static vtables
* separate fns
* simpler?
* make maps work
* windows fixes
* Automated change: Fix sanity tests
* simplify code
* Automated change: Fix sanity tests
* vtable-test
* dont always create vec/map impls for every type
* comments
* make error consistent
* move method private
* progress
* durations!
* Automated change: Fix sanity tests
* fix
* fix
* fix
* Automated change: Fix sanity tests
* post-load
* Automated change: Fix sanity tests
* document JsonPostLoad() and add static_assert
* don't copy field names, to avoid length limitations
* use absl::Status
* accept either string or number for numeric values
* add test for direct data member of another struct type
* remove unused method
* add support for retaining part of the JSON wirthout processing
* update test for changes in Json::Parse() API
* add absl::optional support
* Automated change: Fix sanity tests
* fix tests, improve error messages, and add overload to parse to existing object
* remove overload of LoadFromJson()
* change special case for Json to instead use Json::Object
* rename resolver_result_parsing to client_channel_service_config
* split up service_config_test into a separate test for each component
* convert client channel service config parser to new API
* fix build
* converted retry global params
* convert retry method-level parsing, but still need to find a way to control parsing via a channel arg
* improve error structure, add missing types, and improve tests
* clang-format
* Automated change: Fix sanity tests
* fix build
* add LoadJsonObjectField(), add LoadFromJson() overload that takes an ErrorList parameter, and add tests for parsing bare top-level types
* fix msan
* Automated change: Fix sanity tests
* fix error message
* Automated change: Fix sanity tests
* fix test
* add ability to disable fields
* plumb in channel args to disable parsing
* Automated change: Fix sanity tests
* use const char* instead of absl::string_view for enable_key
* fix resolver_component_test
* Automated change: Fix sanity tests
* work around mac build problem
* Automated change: Fix sanity tests
* work around gcc6 problem
* Automated change: Fix sanity tests
* fix build
* fix build
* don't use alternative builder in tests
* convert message size service config parser
* convert fault injection service config parser
* rename files
* add specialization for unique_ptr
* avoid moves in client channel service config parser
* avoid moves in retry service config parser, and do some cleanup
* avoid moves in message_size service config parser
* avoid moves for fault injection service config parser, and use internal channel arg
* convert rbac service config parser
* clang-format
* WIP
* convert top-level service config parser
* clang-format
* fix build
* fix rbac service config parser test
* fix signed-ness problem and reversed-conditional bug
* fix unused param
* fix json_string method
* fix max message length defaults
* add copy ctors to appease windows compiler
* fix RLS LB config parser test
* fix name resolution test
* fix build
* work around gmock portability bug
* fix sanity
* add missing build dep
* make RBAC principal and permissions movable, not copyable
* Revert "make RBAC principal and permissions movable, not copyable"
This reverts commit 53315bccc9.
* attempt to simplify HeaderMatcher and StringMatcher parsing
* more bloat reduction in RBAC service config parser
* fix sanity
* add missing build dep
* attempt work-around for MSVC bug
* Revert "attempt work-around for MSVC bug"
This reverts commit e54c89e1e4.
* attempt work-around for Windows build problem
* try another work-around
* fix sanity
* appease clang-tidy
* generate_projects
* attempt to fix Windows build
* more windows fixes
* more windows fix
* yet more windows fixes
* try without noexcept
* remove unnecessary boilerplate
* code review changes
* fix breakage
Co-authored-by: Craig Tiller <craig.tiller@gmail.com>
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Co-authored-by: Craig Tiller <ctiller@google.com>
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* begin c++
* Automated change: Fix sanity tests
* progress
* progress
* missing-files
* Automated change: Fix sanity tests
* moved-from-stats
* remove old benchmark cruft, get tests compiling
* iwyu
* Automated change: Fix sanity tests
* fix
* fix
* fixes
* fixes
* add needed constructor
* Automated change: Fix sanity tests
* iwyu
* fix
* fix?
* fix
* fix
* Remove ResetDefaultEventEngine
Now that it is a weak_ptr, there's no need to explicitly reset it. When
the tracked shared_ptr is deleted, the weak_ptr will fail to lock, and a
new default EventEngine will be created.
* forget existing engine with FactoryReset
* add visibility
* fix
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Co-authored-by: AJ Heller <hork@google.com>