Fixes an issue when an active context selected automatically picked up
as context for `secondary_k8s_api_manager`.
This was introducing an error in GAMMA Baseline PoC
```
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('100.71.2.143', 56723), raddr=('35.199.174.232', 443)>
```
Here's how the secondary context is incorrectly falls back to the
default context when `--secondary_kube_context` is not set:
```
k8s.py:142] Using kubernetes context "gke_grpc-testing_us-central1-a_psm-interop-security", active host: https://35.202.85.90
k8s.py:142] Using kubernetes context "None", active host: https://35.202.85.90
```
- Add Github Action to conditionally run PSM Interop unit tests:
- Only run when changes are detected in
`tools/run_tests/xds_k8s_test_driver` or any of the proto files used by
the driver
- Only run against PRs and pushes to `master`, `v1.*.*` branches
- Runs using `python3.9` and `python3.10`
- Ready to be added to the list of required GitHub checks
- Add `tools/run_tests/xds_k8s_test_driver/tests/unit/__main__.py` test
loader that recursively discovers all unit tests in
`tools/run_tests/xds_k8s_test_driver/tests/unit`
- Add basic coverage for `XdsTestClient` and `XdsTestServer` to verify
the test loader picks up all folders
Related:
- First unit tests without automated CI added in #34097
The tests are skipped incorrectly because `config.server_lang` is
incorrectly compared with the string value "java", instead of
`skips.Lang.JAVA`.
This has been broken since #26998.
```
xds_url_map_testcase.py:372] ----- Testing TestTimeoutInRouteRule -----
xds_url_map_testcase.py:373] Logs timezone: UTC
skips.py:121] Skipping TestConfig(client_lang='java', server_lang='java', version='v1.57.x')
[ SKIPPED ] setUpClass (timeout_test.TestTimeoutInRouteRule)
xds_url_map_testcase.py:372] ----- Testing TestTimeoutInApplication -----
xds_url_map_testcase.py:373] Logs timezone: UTC
skips.py:121] Skipping TestConfig(client_lang='java', server_lang='java', version='v1.57.x')
[ SKIPPED ] setUpClass (timeout_test.TestTimeoutInApplication)
```
This is to make sure upgrading packaging module won't break our logic on
version-based version skipping.
This also fixes a small issue with `dev-` prefix - it should only be
allowed on the left side of the comparison.
Context: packaging module needs to be upgraded to be compatible with
`blackd`.
This PR fixes the bootstrap generator interop test by making the node
metadata flag dependent on version, which was causing a breakage
previously as all bootstrap generator version's don't necessarily
support the deexpiermentalized flag.
PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
As part of the dualstack backend designs, subchannels will be created
lazily. Therefore, instead of asserting that there is 1 READY subchannel
and `n - 1` IDLE subchannels, we just assert that there is 1 READY
subchannel.
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
Fixes the issue introduced in https://github.com/grpc/grpc/pull/33104,
where stopping the current run didn't reset `self.time_start_requested`,
`self.time_start_completed`, `self.time_start_stopped`. Because of this,
the subsetting test (the only one [redeploying the client
app](10001d16a9/tools/run_tests/xds_k8s_test_driver/tests/subsetting_test.py (L73C1-L74)))
started failing with:
```py
Traceback (most recent call last):
File "xds_k8s_test_driver/tests/subsetting_test.py", line 76, in test_subsetting_basic
test_client: _XdsTestClient = self.startTestClient(
File "xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 615, in startTestClient
test_client = self.client_runner.run(server_target=test_server.xds_uri,
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_xds_client_runner.py", line 110, in run
super().run()
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_base_runner.py", line 112, in run
raise RuntimeError(
RuntimeError: Deployment psm-grpc-client: has already been started at 2023-05-27T13:47:15.262461
```
This PR:
1. Instead of relying on the `time_start_requested`,
`time_start_stopped` to produce GCP links, tracks the history run of
each deployment. This fixes the issue described above, and adds support
for listing all past runs executed by a k8s runner.
2. Minor: remove the unnecessary call to `test_client.cleanup()` when
there's no past deployment runs (e.g. at the first iteration of `for i
in range(_NUM_CLIENTS):`)
Better logging for `assertRpcStatusCodes`.
(got tired of looking up the status names)
#### Unexpected status found
Before:
```
AssertionError: AssertionError: Expected only status 15 but found status 0 for method UNARY_CALL:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
}
}
```
After:
```
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
}
}
```
#### No traffic with expected status
Before:
```
AssertionError: 0 not greater than 0
```
After:
```
AssertionError: 0 not greater than 0 : Expected non-zero RPCs with status (15, DATA_LOSS) for method UNARY_CALL, got:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
result {
key: 15
value: 0
}
}
}
```
Previously the error message didn't provide much context, example:
```py
Traceback (most recent call last):
File "/tmpfs/tmp/tmp.BqlenMyXyk/grpc/tools/run_tests/xds_k8s_test_driver/tests/affinity_test.py", line 127, in test_affinity
self.assertLen(
AssertionError: [] has length of 0, expected 1.
```
ref b/279990584.
b/228743575 started happening more frequently and there's more important
fires to worry about. Silence this off-by-one flake to let us to come
back to it when we have a bit more time.
All alternative server runners except the failover test reuse the primary server runners' namespace. Failover test is using the secondary cluster, and manages its own namespace there. `reuse_namespace` disables namespace cleanup, and in this case it was set to `True` incorrectly.
* Enable outlier detection k8s interop test for Java. (#30641)
* xDS interop: enable outlier detection Java tests in >= 1.49.x
Co-authored-by: Terry Wilson <terrymwilson@gmail.com>
pod_name shouldn't be a part of the test app, it's purely k8s' idiom.
Originally server_id was intended for this purpose, but it was missed
when support for multiple server replicas added.
This replaces pod_name and server_id with hostname and improves
replica-specific log messages, so it's clear to what server
RPCs are issued.
In addition, now all RPC logs are annotated with the hostname:port,
so the destination is clear.
Before:
```
server_app.py:76] Setting health status to serving
grpc.py:60] RPC XdsUpdateHealthService.SetServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:60] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:78] Server reports status: SERVING
```
After:
```
server_app.py:89] [psm-grpc-server-69bcf749c5-bg4x5] Setting health status to NOT_SERVING
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC XdsUpdateHealthService.SetNotServing(request=Empty({}), timeout=90, wait_for_ready=True)
grpc.py:72] [psm-grpc-server-69bcf749c5-bg4x5:52902] RPC Health.Check(request=HealthCheckRequest({}), timeout=90, wait_for_ready=True)
server_app.py:92] [psm-grpc-server-69bcf749c5-bg4x5] Health status status: NOT_SERVING
```
Similarly, this adds hostname to the client app, mainly for logging.
In python tests that require set_not_serving server RPC, override
the python server with the reference server (Java) because
the python server doesn't yet support set_not_serving RPC.
Ref https://github.com/grpc/grpc/issues/30635.
Separates xDS Test Client/Server (represent an interface to corresponding workload running remotely) from their runners (kubernetes-specific logic to provision the workloads with prerequisites).
This is a refactoring, should not change the behavior.
Some tests override unittest's `tearDown()`, which is not wrong, but less resilient than overriding custom `cleanup()` that is being retried in framework's `tearDown()`.
- xDS interop: add support for the reference xds test server
- Set default xDS test server reference to Java `v1.48.1`
- Override xDS test server with the reference in Outlier Detection
* Add xDS interop test for outlier detection
This implements the test described in #29623, and plumbing for setting the
outlierDetection field in the backend service config. The changes in this PR
are very similar to #29688.
* Fix use of configure method
* Correct copy/paste error
* Fix metadata configuration syntax
* Increase QPS, use just one method
* Format code
* Apply suggestions from code review
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
* Address review comments
* Only Java implements the required server features
* Automated change: Fix sanity tests
* Address review comments
* Use double quotes for docstring
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
Co-authored-by: murgatroid99 <murgatroid99@users.noreply.github.com>
Resume the failover test. For now, just on master. Will be resumed on other branches, when the fix is backported.
At the moment, the master is fixed in java and go.
ref b/238226704
Added a couple of tests which run the baseline_test with all released
bootstrap generator versions on client and server. These tests will be
run on a continuous integration environment with gRPC servers and
clients built using the latest released version of gRPC in one selected
language.
* Add supported Node version ranges in xDS k8s url_map tests
This adds is_supported implementations for most of the url_map tests that didn't
already have them. The exception is metadata_filter_test because it doesn't use
any specific client features.
* Fix formatting
* Improve timeout test check order
1. Fixes the issue with Java PSM security tests accidentally skipped because Java was missing from the list of languages, ref https://github.com/grpc/grpc/pull/28978
2. Invert the logic of `is_supported` methods, making them normally open
3. Make languages an `enum.Flag` to avoid accidental typos when listing the languages
4. Rename `XdsKubernetesTestCase.isSupported` to `XdsKubernetesTestCase.is_supported` to be consistent with `XdsUrlMapTestCase.is_supported`
5. Add extra logging