[PSM Interop] Fix the 10-minute-teardown issue in GAMMA tests (#34560)

We shouldn't just set `termination_grace_period_seconds=600` by default
for all gamma tests extending `GammaXdsKubernetesTestCase`.

This is what's causing the deployment deletion issue:

> `framework.helpers.retryers.RetryError: Retry error calling
framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup: 1
attempts exhausted. Last exception: RetryError: Retry error calling
framework.infrastructure.k8s.KubernetesNamespace.get_deployment: timeout
0:05:00 (h:mm:ss) exceeded. Check result callback returned False.`

We wait for 5 minutes, while the deployment is happily handing for 10.
Then the second cleanup retry kills it - but not before waiting for
another 5 minutes.

I think `self.force = False` may be solving another issue triggered by
the get_deployment retry timeout: because we start over deleting the
resources by name and some of them are deleted from the first attempt we
get 404. And I'm pretty sure we don't do error-handling correctly when
deleting CRD-based resources - which cascades into even more unnecessary
retries.
pull/34593/head
Sergii Tkachenko 1 year ago committed by GitHub
parent 96f36b6991
commit 724e12a1c7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      tools/run_tests/xds_k8s_test_driver/framework/test_app/runners/k8s/gamma_server_runner.py
  2. 7
      tools/run_tests/xds_k8s_test_driver/framework/xds_gamma_testcase.py
  3. 5
      tools/run_tests/xds_k8s_test_driver/tests/gamma/affinity_test.py

@ -70,7 +70,7 @@ class GammaServerRunner(KubernetesServerRunner):
safilter_name: str = "ssa-filter",
sapolicy_name: str = "ssa-policy",
bepolicy_name: str = "backend-policy",
termination_grace_period_seconds: Optional[int] = None,
termination_grace_period_seconds: int = 0,
pre_stop_hook: bool = False,
):
# pylint: disable=too-many-locals

@ -30,16 +30,13 @@ XdsTestServer = server_app.XdsTestServer
logger = logging.getLogger(__name__)
# We never actually hit this timeout under normal circumstances, so this large
# value is acceptable.
_TERMINATION_GRACE_PERIOD_SECONDS = 600
# TODO(sergiitk): [GAMMA] Move into framework/test_cases
class GammaXdsKubernetesTestCase(xds_k8s_testcase.RegularXdsKubernetesTestCase):
server_runner: GammaServerRunner
frontend_service_name: str
pre_stop_hook: Optional[bool] = None
termination_grace_period_seconds: int = 0
def setUp(self):
"""Hook method for setting up the test fixture before exercising it."""
@ -113,7 +110,7 @@ class GammaXdsKubernetesTestCase(xds_k8s_testcase.RegularXdsKubernetesTestCase):
network=self.network,
debug_use_port_forwarding=self.debug_use_port_forwarding,
enable_workload_identity=self.enable_workload_identity,
termination_grace_period_seconds=_TERMINATION_GRACE_PERIOD_SECONDS,
termination_grace_period_seconds=self.termination_grace_period_seconds,
pre_stop_hook=self.pre_stop_hook,
)

@ -34,6 +34,11 @@ RpcTypeUnaryCall = xds_url_map_testcase.RpcTypeUnaryCall
_REPLICA_COUNT = 3
# TODO(rbellevi): set this property on the prestop hook test class
# We never actually hit this timeout under normal circumstances, so this large
# value is acceptable.
# termination_grace_period_seconds: int = 600
class AffinityTest(xds_gamma_testcase.GammaXdsKubernetesTestCase):
def getClientRpcStats(

Loading…
Cancel
Save