[PSM Interop] Increase k8s startup probe total time (#32875)

Previously, we didn't configure the failureThreshold, so it used its
default value. The final `startupProbe` looked like this:

```json
{
  "startupProbe": {
    "failureThreshold": 3,
    "periodSeconds": 3,
    "successThreshold": 1,
    "tcpSocket": {
      "port": 8081
    },
    "timeoutSeconds": 1
}
```

Because of it, the total time before k8s killed the container was 3
times `failureThreshold` * 3 seconds wait between probes `periodSeconds`
= 9 seconds total (±3 seconds waiting for the probe response).

This greatly affected PSM Security test server, some implementations of
which waited for the ADS stream to be configured before starting
listening on the maintenance port. This lead for the server container
being killed for ~7 times before a successful startup:

```
15:55:08.875586 "Killing container with a grace period"
15:53:38.875812 "Killing container with a grace period"
15:52:47.875752 "Killing container with a grace period"
15:52:38.874696 "Killing container with a grace period"
15:52:14.874491 "Killing container with a grace period"
15:52:05.875400 "Killing container with a grace period"
15:51:56.876138 "Killing container with a grace period"
```

These extra delays lead to PSM security tests timing out.

ref b/277336725
pull/32849/head^2
Sergii Tkachenko 2 years ago committed by GitHub
parent a2c89d0b24
commit f2a7f6d51b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 3
      tools/run_tests/xds_k8s_test_driver/kubernetes-manifests/client-secure.deployment.yaml
  2. 3
      tools/run_tests/xds_k8s_test_driver/kubernetes-manifests/client.deployment.yaml
  3. 5
      tools/run_tests/xds_k8s_test_driver/kubernetes-manifests/server-secure.deployment.yaml
  4. 3
      tools/run_tests/xds_k8s_test_driver/kubernetes-manifests/server.deployment.yaml

@ -32,6 +32,9 @@ spec:
tcpSocket:
port: ${stats_port}
periodSeconds: 3
## Extend the number of probes well beyond the duration of the test
## driver waiting for the container to start.
failureThreshold: 1000
args:
- "--server=${server_target}"
- "--stats_port=${stats_port}"

@ -32,6 +32,9 @@ spec:
tcpSocket:
port: ${stats_port}
periodSeconds: 3
## Extend the number of probes well beyond the duration of the test
## driver waiting for the container to start.
failureThreshold: 1000
args:
- "--server=${server_target}"
- "--stats_port=${stats_port}"

@ -32,6 +32,9 @@ spec:
tcpSocket:
port: ${maintenance_port}
periodSeconds: 3
## Extend the number of probes well beyond the duration of the test
## driver waiting for the container to start.
failureThreshold: 1000
args:
- "--port=${test_port}"
- "--maintenance_port=${maintenance_port}"
@ -46,7 +49,7 @@ spec:
value: "true"
- name: GRPC_XDS_EXPERIMENTAL_V3_SUPPORT
value: "true"
# TODO(sergiitk): this should be conditional for if version < v1.37.x
## TODO(sergiitk): this should be conditional for if version < v1.37.x
- name: GRPC_XDS_EXPERIMENTAL_NEW_SERVER_API
value: "true"
- name: GRPC_XDS_EXPERIMENTAL_RBAC

@ -32,6 +32,9 @@ spec:
tcpSocket:
port: ${test_port}
periodSeconds: 3
## Extend the number of probes well beyond the duration of the test
## driver waiting for the container to start.
failureThreshold: 1000
args:
- "--port=${test_port}"
ports:

Loading…
Cancel
Save