xds/interop: Delay to drain queued RPCs in authz test (#27991)

The authz test flaked as no RPCs of the expected type had completed
within the sampling window. Server logs showed authz logs completing
batch of 276 RPCs back-to-back, without the expected 40 ms separation
(qps=25). It took a bit over 1 second to process through the backlog.
With the sample duration of 500 ms and there being a polling delay
between when the channel is READY and when the test driver polls
channelz, it makes sense that we can get lucky much of the time.

Obviously, adding a sleep isn't great either, but measuring the queue
length indirectly is more complex than really appropriate here. The real
solution is to stop using this continuous-qps test client.

```
Traceback (most recent call last):
  File "/tmp/work/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 252, in test_tls_allow
    grpc.StatusCode.OK)
  File "/tmp/work/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 183, in configure_and_assert
    method=rpc_type)
  File "/tmp/work/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 284, in assertRpcStatusCodes
    self.assertGreater(stats.result[status_code.value[0]], 0)
AssertionError: 0 not greater than 0
```
pull/28001/head
Eric Anderson 3 years ago committed by GitHub
parent b7d4569d34
commit 9be868488f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 12
      tools/run_tests/xds_k8s_test_driver/tests/authz_test.py

@ -13,6 +13,7 @@
# limitations under the License.
import datetime
import time
from typing import Optional
from absl import flags
@ -28,6 +29,13 @@ _XdsTestServer = xds_k8s_testcase.XdsTestServer
_XdsTestClient = xds_k8s_testcase.XdsTestClient
_SecurityMode = xds_k8s_testcase.SecurityXdsKubernetesTestCase.SecurityMode
# The client generates QPS even when it is still loading information from xDS.
# Once it finally connects there will be an outpouring of the bufferred RPCs and
# the server needs time to chew through the backlog, especially since it is
# still a new process and so probably interpreted. The server on one run
# processed 225 RPCs a second, so with the client configured for 25 qps this is
# 40 seconds worth of buffering before starting to drain the backlog.
_SETTLE_DURATION = datetime.timedelta(seconds=5)
_SAMPLE_DURATION = datetime.timedelta(seconds=0.5)
@ -193,6 +201,7 @@ class AuthzTest(xds_k8s_testcase.SecurityXdsKubernetesTestCase):
test_server: _XdsTestServer = self.startSecureTestServer()
self.setupServerBackends()
test_client: _XdsTestClient = self.startSecureTestClient(test_server)
time.sleep(_SETTLE_DURATION.total_seconds())
with self.subTest('01_host_wildcard'):
self.configure_and_assert(test_client, 'host-wildcard',
@ -246,6 +255,7 @@ class AuthzTest(xds_k8s_testcase.SecurityXdsKubernetesTestCase):
test_server: _XdsTestServer = self.startSecureTestServer()
self.setupServerBackends()
test_client: _XdsTestClient = self.startSecureTestClient(test_server)
time.sleep(_SETTLE_DURATION.total_seconds())
with self.subTest('01_host_wildcard'):
self.configure_and_assert(test_client, 'host-wildcard',
@ -271,6 +281,7 @@ class AuthzTest(xds_k8s_testcase.SecurityXdsKubernetesTestCase):
test_server: _XdsTestServer = self.startSecureTestServer()
self.setupServerBackends()
test_client: _XdsTestClient = self.startSecureTestClient(test_server)
time.sleep(_SETTLE_DURATION.total_seconds())
with self.subTest('01_host_wildcard'):
self.configure_and_assert(test_client, 'host-wildcard',
@ -304,6 +315,7 @@ class AuthzTest(xds_k8s_testcase.SecurityXdsKubernetesTestCase):
test_server: _XdsTestServer = self.startSecureTestServer()
self.setupServerBackends()
test_client: _XdsTestClient = self.startSecureTestClient(test_server)
time.sleep(_SETTLE_DURATION.total_seconds())
with self.subTest('01_host_wildcard'):
self.configure_and_assert(test_client, 'host-wildcard',

Loading…
Cancel
Save