* Support Python 3.11
* Update build images for 3.11
* Whoopsie
* The architecture of this thing is garbage
* Silence ownership warning
* Account for change in git behavior
* Fix directory
* I am in great pain
* Update Windows and arm linux
* Agh
* Clean up
This adds a unit test for XdsClient and fixes several watcher-notification bugs found in the process. Specifically:
- When an ADS stream fails or an xDS channel reports a connectivity failure, report an error only to the watchers for resources being subscribed to on that particular channel, not to watchers on other channels.
- Cache the error status for the channel, so that if a new watcher is started after the channel reports the error, we can immediately report that error to the new watcher.
- If a resource is NACKed and has not been previously cached, or does not exist, report that fact to any new watcher that may be started later.
- If a resource in an ADS response is unparseable but is wrapped in a `Resource` wrapper, we do know its name, so record the validation failure in the cache and report it to the watchers.
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* Upgrade google/benchmark to v1.7.0
Reason: Setup and Teardown methods are nice to have for multithreaded
tests.
* use version tag
* Revert "use version tag"
This reverts commit 6c7f5e0d35.
Fun edge case: when `rand_string()` happen to generate numbers only,
yaml interprets `deployment_id` label value as an integer,
but k8s expects label values to be strings.
K8s responds with a barely readable 400 Bad Request error:
`ReadString: expects \" or n, but found 9, error found in #10 byte of ...|ent_id`.
Prepending deployment name forces deployment_id into a string,
as well as it's just a better description.
* client_channel: rewrite illegal status codes from control plane
* rewrite illegal status codes for call creds
* move fail_lb policy out of retry_lb_fail test so it can be reused
* test resolver and LB policy status rewrites
* add test for ConfigSelector status rewriting
* attempt to add client_auth filter unit test
* fix client_auth_filter test
* cleanup test
* fix build
* fix some memory leaks
* Automated change: Fix sanity tests
* Update client_auth_filter_test.cc
* fix build
* code review comments
* clang-tidy
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
Co-authored-by: Craig Tiller <ctiller@google.com>
* [cleanup] Remove profiling timers
- nobody has used this system in years
- if we needed it, we'd probably rewrite it at this point to be something more modern
- let's remove it until that need arises
* fix
* fixes
To capture the return status of the test in run_test the last command must be the call to the test itself.
This removes `set +x`, which makes the run_test always return success, and not propagate the test status.
I can't find it, but this exact error bit us before. Looks like it leaked to other scripts.
The good thing is if the test was executed, it's failure would still be picked up from the result xml.
However, if the test framework didn't start in the first place, the result will be false positive.
Example: https://source.cloud.google.com/results/invocations/98d3e679-ec8a-40bd-9f36-88179747b0d6/targets
```
/home/kbuilder/.pyenv/versions/k8s_xds_test_runner/bin/python3: Error while finding module specification for 'tests.authz_test' (ModuleNotFoundError: No module named 'tests')
+ set +x
Failed test suites: 0
[ID: 3548168] Command finished after 625 secs, exit value: 0
```
When we use retryers with `log_level=logging.INFO`, tenacity logs the result value (or an exception) after each unsuccessful retry attempt.
We often retry methods that return objects, resulting in unreadable log messages:
```
I0820 03:16:29.027635 140613877811008 before_sleep.py:45] Retrying framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup in 10.0 seconds as it raised RetryError: RetryError[Attempts: 21, Value: {'api_version': 'v1',
'kind': 'Namespace',
'metadata': {'annotations': None,
'cluster_name': None,
'creation_timestamp': datetime.datetime(2022, 8, 20, 2, 55, 32, tzinfo=tzlocal()),
'deletion_grace_period_seconds': None,
'deletion_timestamp': datetime.datetime(2022, 8, 20, 3, 6, 27, tzinfo=tzlocal()),
'finalizers': None,
'generate_name': None,
'generation': None,
'labels': {'kubernetes.io/metadata.name': 'psm-interop-server-20220820-0253-yrmam',
'name': 'psm-interop-server-20220820-0253-yrmam',
'owner': 'xds-k8s-interop-test'},
'managed_fields': [{'api_version': 'v1',
'fields_type': 'FieldsV1',
'fields_v1': {'f:metadata': {'f:labels': {'.': {},
'f:kubernetes.io/metadata.name': {},
... (82 more lines)
```
This PR introduces custom `before_sleep` logger, that only logs the value if it's a primitive: `int, str, bool`.
Otherwise, it logs the type, example:
```
k8s_base_runner.py:311] Waiting for pod psm-grpc-client-5d5648478f-7vsf7 to start
retryers.py:192] Retrying framework.infrastructure.k8s.KubernetesNamespace.get_pod in 1.0 seconds as it returned type <class 'kubernetes.client.models.v1_pod.V1Pod'>.
retryers.py:192] Retrying framework.infrastructure.k8s.KubernetesNamespace.get_pod in 1.0 seconds as it returned type <class 'kubernetes.client.models.v1_pod.V1Pod'>.
```
Note that this only changes the behavior of the unsuccessful retries, and doesn't affect the new feature that prints formatted k8s status field on if the *final* retry attempt failed.
- Enables pod log collection in all PSM interop jobs implemented in https://github.com/grpc/grpc/pull/30594.
- Associate test suite runs with their own log file, so it's displayed on "Target Log" tab
- Added support for pod log collection. To enable, set `--collect_app_logs` flag, and specify `--log_dir`.
- Added support and helpers for operating on the `--log_dir` (natively provided by absl)
- Added support for `--follow` to `bin/run_test_server.py` and `bin/run_test_client.py` to follow pod logs printed to stdout
- Moved `PortForwarder` from k8s.py to its own file
The collection itself will be enabled per-suite in https://github.com/grpc/grpc/pull/30735.