Previously, we didn't configure the failureThreshold, so it used its
default value. The final `startupProbe` looked like this:
```json
{
"startupProbe": {
"failureThreshold": 3,
"periodSeconds": 3,
"successThreshold": 1,
"tcpSocket": {
"port": 8081
},
"timeoutSeconds": 1
}
```
Because of it, the total time before k8s killed the container was 3
times `failureThreshold` * 3 seconds wait between probes `periodSeconds`
= 9 seconds total (±3 seconds waiting for the probe response).
This greatly affected PSM Security test server, some implementations of
which waited for the ADS stream to be configured before starting
listening on the maintenance port. This lead for the server container
being killed for ~7 times before a successful startup:
```
15:55:08.875586 "Killing container with a grace period"
15:53:38.875812 "Killing container with a grace period"
15:52:47.875752 "Killing container with a grace period"
15:52:38.874696 "Killing container with a grace period"
15:52:14.874491 "Killing container with a grace period"
15:52:05.875400 "Killing container with a grace period"
15:51:56.876138 "Killing container with a grace period"
```
These extra delays lead to PSM security tests timing out.
ref b/277336725
PSM Interop: Local dev various improvements
- Cleanup resources on ctrl+c
- Add startup probes to address the issue with port forwarding starting
before the workload listens on a port
- Remove misleading restartPolicy: it's silently ignored by k8s
- Extra debug message with port-forwarding command
This fixes an issue with KubernetesNamespace.list_deployment_pods()
as well as the deployment itself would select incorrect pods
when multiple deployments share the same namespace.
* Add xDS interop test for outlier detection
This implements the test described in #29623, and plumbing for setting the
outlierDetection field in the backend service config. The changes in this PR
are very similar to #29688.
* Fix use of configure method
* Correct copy/paste error
* Fix metadata configuration syntax
* Increase QPS, use just one method
* Format code
* Apply suggestions from code review
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
* Address review comments
* Only Java implements the required server features
* Automated change: Fix sanity tests
* Address review comments
* Use double quotes for docstring
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
Co-authored-by: murgatroid99 <murgatroid99@users.noreply.github.com>
* Add back references and scope field
* Set scope in router
* Reverse order of cleanup
* Add router_scope flag
* Use router_scope flag to create Router
* I apparently don't know how to brain
* Yapf
* Yeah, that can't be the default
* Remove debug print
* Remove impossible todos
* And another
* Switch from router-scope to config-scope
* Implement schema changes
* Use backend service URL
* Use CLH reference format to backend service
* I am an idiot
* *internal screaming*
* Try project number
* Why is this all awful
* Go back to trying project name
* Try cleaning things up
* Agh
* Address review comments
* Remove superfluous Optional type
* Add xds retry interop test to GKE test framework
* s/Affinity/Retry/
* more informative test name
* enable retry
* update java test server
* add missing import
* affinity test
- most basic affinity test
- verify that the received RDS and CDS are correctly configured for affinity
- verify that all RPCs are only sent to the one backend
- verify that only one sub-channel is connected, the other 2 are IDLE
And infra changes:
- add argument to set affinity config when creating backend service
- add a new backend service "affinity" to be shared by all affinity test
- this backend service is configured to do header affinity
- it has 3 endpoints
- replica support copied from PR https://github.com/grpc/grpc/pull/26360
- update backend services from GRPC to HTTP2, to disable validate-for-proxyless
- this will be reverted later
- add channelz function to query subchannels
- add method to configure the initial RPC config (RPC types and RPC metadata) when creating the client
- set env var to enable RING_HASH support
* c1
* REVERT THIS: update strategy to trigger a manual build
* config: suffix to prefix
* Revert "REVERT THIS: update strategy to trigger a manual build"
This reverts commit 830776fef9.
In gRPC-Go repo, as part of the PSM security interop tests, we changed
the xDS interop server to register admin services (in both secure and
non-secure modes). Attempting to register CSDS without an xds bootstrap
file causes the server binary to exit.
While we work to find a graceful solution to the problem of registering
CSDS without an xds bootstrap file, adding the bootstrap generator to
the non-secure server deployment fixes the issue.
Also, it looks like we would need an xds bootstrap file for non-secure
servers in the near future to test other server features.