<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Mark D. Roth <roth@google.com>
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Old message:
new_cluster_1: UNAVAILABLE: errors validating xds_cluster_resolver LB
policy config: [field:discoveryMechanisms error:must be non-empty]
New message:
new_cluster_1: FAILED_PRECONDITION: aggregate cluster graph has no leaf
clusters
If we get a readable event on an fd and both the following happens:
- c-ares does *not* read all bytes off the fd
- c-ares removes the fd from the set ARES_GETSOCK_READABLE
... then we have a busy loop here, where we'd keep asking c-ares to
process an fd that it no longer cares about.
This is indirectly related to a change in this code one month ago:
https://github.com/grpc/grpc/pull/33942 - before that change, c-ares
would close the socket when it called
[handle_error](7f3262312f/src/lib/ares_process.c (L707))
and so `IsFdStillReadableLocked` would start returning `false`, causing
us to get away with [this
loop](f6a994229e/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc (L371)).
Now, because `IsFdStillReadableLocked` will keep returning true (because
of our overridden `close` API), we'll loop forever.
Based on updates at https://github.com/grpc/proposal/pull/380
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We added this as an exploratory measure for a customer that thought they
were using open census (this turned out to be emphatically false).
Remove it since it's probably not how we ultimately want to do this, and
wait for something better to come along.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This makes the resolver component tests suite run on Window RBE by
adding a flag in the test driver to further differentiate between Bazel
local run and Bazel RBE run on Windows since they have different
RUNFILES behavior.
Local Bazel run succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel2 test --dynamic_mode=off --verbose_failures --test_arg=--running_locally=true //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 196.080s, Critical Path: 193.21s
INFO: 2 processes: 1 internal, 1 local.
INFO: Build completed successfully, 2 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 193.1s
Executed 1 out of 1 test: 1 test passes.
```
RBE run succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --bazelrc=tools/remote_build/windows.bazelrc test --config=windows_opt --dynamic_mode=off --verbose_failures --host_linkopt=/NODEFAULTLIB:libcmt.lib --host_linkopt=/DEFAULTLIB:msvcrt.lib --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Invocation ID: d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
WARNING: --remote_upload_local_results is set, but the remote cache does not support uploading action results or the current account is not authorized to write local results to the remote cache.
INFO: Streaming build results to: https://source.cloud.google.com/results/invocations/d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (0 packages loaded, 133 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 41.627s, Critical Path: 39.42s
INFO: 2 processes: 1 internal, 1 remote.
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 33.0s
Executed 1 out of 1 test: 1 test passes.
INFO: Streaming build results to: https://source.cloud.google.com/results/invocations/d467f2e3-7da6-4bb5-8b9b-84f1181ebc60
INFO: Build completed successfully, 2 total actions
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This reverts commit fe1ba18dfc.
Reason: break import
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This helps developers run benchmark loadtests locally. See comments in
scenario_runner.py for usage.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Local Bazel invocation succeeds:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel2 test --dynamic_mode=off --verbose_failures //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker@poller=epoll1.exe
INFO: Elapsed time: 199.262s, Critical Path: 193.48s
INFO: 2 processes: 1 internal, 1 local.
INFO: Build completed successfully, 2 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker@poller=epoll1 PASSED in 193.4s
Executed 1 out of 1 test: 1 test passes.
```
The local invocation of RBE failed with linker error `LINK : error
LNK2001: unresolved external symbol mainCRTStartup`, but that does not
limited to this target:
https://gist.github.com/yijiem/2c6cbd9a31209a6de8fd711afbf2b479.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Update from gtcooke94:
This PR adds support to build gRPC and it's tests with OpenSSL3. There were some
hiccups with tests as the tests with openssl haven't been built or exercised in a
few months, so they needed some work to fix.
Right now I expect all test files to pass except the following:
- h2_ssl_cert_test
- ssl_transport_security_utils_test
I confirmed locally that these tests fail with OpenSSL 1.1.1 as well,
thus we are at least not introducing regressions. Thus, I've added compiler directives around these tests so they only build when using BoringSSL.
---------
Co-authored-by: Gregory Cooke <gregorycooke@google.com>
Co-authored-by: Esun Kim <veblush@google.com>
CNR the flake, but I've changed the test (which is very old) to use some
of our more modern helper functions that have saner timeouts.
Also re-add a `return` statement that was accidentally removed in
#33753, which I noticed while working on this. Its absence doesn't cause
a real problem, but it does cause us to needlessly trigger a duplicate
connection attempt or report a duplicate CONNECTING update in some
cases.
De-experiment pick first since we have both affinity and randomness E2E
test running successfully.
---------
Co-authored-by: Yash Tibrewal <yashkt@google.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR is mainly a set of improvements that allow the C++ Alarm to be
migrated away from legacy iomgr. It cannot be landed without significant
speedup, due to third-parties relying on a fast path for immediate timer
execution with deadlines <= now.
Previous EventEngine performance of bm_alarm, compared to baseline iomgr
timers: *0.014%*
This PR: *2.5%*
Regarding previous failures to land this change: The cloud libraries
team agreed to reduce the amount of stress in their alarm stress test
https://github.com/googleapis/google-cloud-cpp/pull/12378
Fix sticky-TF behavior such that once we enter TRANSIENT_FAILURE, we do
not leave that state if we get a new address list.
Also, fix handling of subchannels in state TRANSIENT_FAILURE.
Previously, if a subchannel was already in state TRANSIENT_FAILURE when
we wanted to start a connection attempt on it (e.g., because the
subchannel already existed from a different channel, or because it
already existed in the previous subchannel list), we would wait for it
to report IDLE before attempting to connect. This PR changes pick_first
to instead immediately skip the subchannel and move on to the next one.
Now, the only time we wait for a subchannel in TRANSIENT_FAILURE is when
we wrap back around to the first subchannel in the list.
Port https://github.com/grpc/grpc/pull/33871 to EE's
GrpcPolledFdFactoryPosix.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Not adding CMake support yet
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We decided to not populate `policy_name` with the HTTP filter name in
xDS case. So removing it from `GenerateServiceConfig`. This will be
consistent across languages. The gRFC
[PR](https://github.com/grpc/proposal/pull/346) has been updated.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I'm fairly certain that this path should be non-blocking (and making it
so makes the promise based code far more tractable).
This moves the blocking behavior into the blocking server_cc.cc function
that calls `grpc_server_shutdown_and_notify` instead of in that
non-blocking function.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The previous version (`3.12`) is 7 years old and does not support the
newest Python 3 versions. This causes issues to move certain test
targets (which depends on `pyyaml`) to Python 3 when some CI environment
(e.g. `arm64v8/debian:11`) does not have Python 2 installed. And in
general, we should move away from Python 2. Thus, updated `pyyaml` to
the latest version.
This hopefully should also fix the
`prod:grpc/core/master/linux/arm64/grpc_bazel_test_c_cpp` job breakage.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Rolls forward https://github.com/grpc/grpc/pull/33871
Second and third commits here fix internal build issues
In particular, add a `// IWYU pragma: no_include <ares_build.h>` since
`ares.h` [includes that
anyways](bad62225b7/include/ares.h (L23))
(and seems unlikely for that to change since it would be breaking)
Normally, c-ares related fds are destroyed after all DNS resolution is
finished in [this code
path](c82d31677a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc (L210)).
Also there are some fds that c-ares may fail to open or write to
initially, and c-ares will close them internally before grpc ever knows
about them.
But if:
1) c-ares opens a socket and successfully writes a request on it
2) then a subsequent read fails
Then c-ares will close the fd in [this code
path](bad62225b7/src/lib/ares_process.c (L740)),
but gRPC will have a reference on the fd and will still use it
afterwards.
Fix here is to leverage the c-ares socket-override API to properly track
fd ownership between c-ares and grpc.
Related: internal issue b/292203138
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->