We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
* [fixit] Scale down large tests
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
* mark up cpu usage
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
pod_name shouldn't be a part of the test app, it's purely k8s' idiom.
Originally server_id was intended for this purpose, but it was missed
when support for multiple server replicas added.
This replaces pod_name and server_id with hostname and improves
replica-specific log messages, so it's clear to what server
RPCs are issued.
In addition, now all RPC logs are annotated with the hostname:port,
so the destination is clear.
Before:
```
server_app.py:76] Setting health status to serving
grpc.py:60] RPC XdsUpdateHealthService.SetServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:60] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:78] Server reports status: SERVING
```
After:
```
server_app.py:89] [psm-grpc-server-69bcf749c5-bg4x5] Setting health status to NOT_SERVING
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC XdsUpdateHealthService.SetNotServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:92] [psm-grpc-server-69bcf749c5-bg4x5] Health status status: NOT_SERVING
```
Similarly, this adds hostname to the client app, mainly for logging.
* [fixit] More max_connection_idle fixes
- handle the case that the idle timeout occurs between the connection becoming ready and the request being made (by making the request WAIT_FOR_READY to reconnect if needed)
- fix up the math for cq verification: the MAX_CONNECTION_IDLE_MS is an unscaled timeout, whereas the 3000 ms is scaled, so we cannot directly add them and scale
* Automated change: Fix sanity tests
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Undoes https://github.com/grpc/grpc/pull/27096.
While we lost context why py tests were used pinned cpp server,
we think this is due to lack of support of the set_not_serving RPC
in the python server, see https://github.com/grpc/grpc/issues/30635.
This RPC is only used in two tests, and for them we added a
temporary override of the test server to the reference Java server,
see https://github.com/grpc/grpc/pull/30636.
All other LB tests should work with the python server just fine.
In python tests that require set_not_serving server RPC, override
the python server with the reference server (Java) because
the python server doesn't yet support set_not_serving RPC.
Ref https://github.com/grpc/grpc/issues/30635.
* FaultInjection: Fix random number generation
* Put random generation under a mutex
* Fix IWYU
* Regenerate projects
* Modify timeouts
* Dbg build knobs
* Remove unnecessary slowdown factor
* Tune error tolerance and add note on broken computation of ComputeIdealNumRpcs
absl::Now() is not guaranteed to be monotonic, and we are seeing some
potential time travel at different rates on different machines. I've
replaced the most sensitive pieces with std::chrono::steady_clock to see
if that fixes the flakes we are seeing.
Some minor changes for the platform:
* `sys/types.h` needs to be included before `netinet/tcp.h` to provide
definitions for `u_int8_t`, `u_int16_t`, and `u_int32_t`.
* cares has an OpenBSD-specific config header but is not actually hooked
up in cares.BUILD. Remedy that.
* Disable end2end_binder_transport_test on some platforms
The following test case is flaky on windows
End2EndBinderTransportTestWithDifferentDelayTimes/End2EndBinderTransportTest.UnaryCallServerTimeout/1,
where GetParam() = 10ns
Binder transport won't be run on platform other than Android so it
should be OK to disable the test on some platform.
* Regenerate projects.
This fixes an issue with KubernetesNamespace.list_deployment_pods()
as well as the deployment itself would select incorrect pods
when multiple deployments share the same namespace.
* Reduce transferred data size in invoke_large_test to reduce flakiness
* Reduce number of test cases instead of reducing exchanged data size
* update
* Automated change: Fix sanity tests
Co-authored-by: Vignesh2208 <Vignesh2208@users.noreply.github.com>