pod_name shouldn't be a part of the test app, it's purely k8s' idiom.
Originally server_id was intended for this purpose, but it was missed
when support for multiple server replicas added.
This replaces pod_name and server_id with hostname and improves
replica-specific log messages, so it's clear to what server
RPCs are issued.
In addition, now all RPC logs are annotated with the hostname:port,
so the destination is clear.
Before:
```
server_app.py:76] Setting health status to serving
grpc.py:60] RPC XdsUpdateHealthService.SetServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:60] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:78] Server reports status: SERVING
```
After:
```
server_app.py:89] [psm-grpc-server-69bcf749c5-bg4x5] Setting health status to NOT_SERVING
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC XdsUpdateHealthService.SetNotServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:92] [psm-grpc-server-69bcf749c5-bg4x5] Health status status: NOT_SERVING
```
Similarly, this adds hostname to the client app, mainly for logging.
* [fixit] More max_connection_idle fixes
- handle the case that the idle timeout occurs between the connection becoming ready and the request being made (by making the request WAIT_FOR_READY to reconnect if needed)
- fix up the math for cq verification: the MAX_CONNECTION_IDLE_MS is an unscaled timeout, whereas the 3000 ms is scaled, so we cannot directly add them and scale
* Automated change: Fix sanity tests
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Undoes https://github.com/grpc/grpc/pull/27096.
While we lost context why py tests were used pinned cpp server,
we think this is due to lack of support of the set_not_serving RPC
in the python server, see https://github.com/grpc/grpc/issues/30635.
This RPC is only used in two tests, and for them we added a
temporary override of the test server to the reference Java server,
see https://github.com/grpc/grpc/pull/30636.
All other LB tests should work with the python server just fine.
In python tests that require set_not_serving server RPC, override
the python server with the reference server (Java) because
the python server doesn't yet support set_not_serving RPC.
Ref https://github.com/grpc/grpc/issues/30635.
* FaultInjection: Fix random number generation
* Put random generation under a mutex
* Fix IWYU
* Regenerate projects
* Modify timeouts
* Dbg build knobs
* Remove unnecessary slowdown factor
* Tune error tolerance and add note on broken computation of ComputeIdealNumRpcs
absl::Now() is not guaranteed to be monotonic, and we are seeing some
potential time travel at different rates on different machines. I've
replaced the most sensitive pieces with std::chrono::steady_clock to see
if that fixes the flakes we are seeing.
Some minor changes for the platform:
* `sys/types.h` needs to be included before `netinet/tcp.h` to provide
definitions for `u_int8_t`, `u_int16_t`, and `u_int32_t`.
* cares has an OpenBSD-specific config header but is not actually hooked
up in cares.BUILD. Remedy that.
* Disable end2end_binder_transport_test on some platforms
The following test case is flaky on windows
End2EndBinderTransportTestWithDifferentDelayTimes/End2EndBinderTransportTest.UnaryCallServerTimeout/1,
where GetParam() = 10ns
Binder transport won't be run on platform other than Android so it
should be OK to disable the test on some platform.
* Regenerate projects.
This fixes an issue with KubernetesNamespace.list_deployment_pods()
as well as the deployment itself would select incorrect pods
when multiple deployments share the same namespace.
* Reduce transferred data size in invoke_large_test to reduce flakiness
* Reduce number of test cases instead of reducing exchanged data size
* update
* Automated change: Fix sanity tests
Co-authored-by: Vignesh2208 <Vignesh2208@users.noreply.github.com>
* [fixit] Connectivity: Increase the connect timeout
* Remove old arg
* Fix max_connection_idle and simple_delayed_request as well
* Fix goaway_server_test too
* Use new API
* Fix IWYU
Separates xDS Test Client/Server (represent an interface to corresponding workload running remotely) from their runners (kubernetes-specific logic to provision the workloads with prerequisites).
This is a refactoring, should not change the behavior.
* Bump version to 1.48.0-pre1 (on v1.48.x branch) (#30194)
* bump version to 1.48.0-pre1
* regenerate projects
* Bump version to 1.48.0 (on v1.48.x branch) (#30326)
* bump version to 1.48.0
* regenerate projects
* xds interop: choose correct cluster in grpc_xds_k8s_lb_python.sh (1.48.x backport) (#30329)
* subchannel list: fix ubsan error (#30393) (#30412)
* don't expose vector type
* don't down-cast from inside base class ctor
* xDS interop: add missing image tagging to the buildscripts (#30520) (#30529)
Co-authored-by: Yash Tibrewal <yashkt@google.com>
Co-authored-by: Sergii Tkachenko <sergiitk@google.com>
Co-authored-by: Mark D. Roth <roth@google.com>