[PSM Interop] Temporary remedy for the issue with pod log dups (#32922)

While a proper fix is on the way, this mitigates the number of
duplicated container logs in the xds test server/client pod logs.

The issue is that we only wait between stream restarts when an exception
is caught, which isn't always the reason the stream gets broken. Another
reason is the main container being shut down by k8s. In this situation,
we essentially do

```py
while True:
  try:
    restart_stream()
    read_all_logs_from_pod_start()
  except Exception:
    logger.warning('error')
    wait_seconds(1)
```

This PR makes it

```py
while True:
  try:
    restart_stream()
    read_all_logs_from_pod_start()
  except Exception:
    logger.warning('error')
  finally:
    wait_seconds(5)
```
pull/32621/head^2
Sergii Tkachenko 2 years ago committed by GitHub
parent 7955bfaa4b
commit 2fe7b5b881
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 3
      tools/run_tests/xds_k8s_test_driver/framework/infrastructure/k8s_internal/k8s_log_collector.py

@ -46,7 +46,7 @@ class PodLogCollector(threading.Thread):
log_path: pathlib.Path,
log_to_stdout: bool = False,
log_timestamps: bool = False,
error_backoff_sec: int = 1):
error_backoff_sec: int = 5):
self.pod_name = pod_name
self.namespace_name = namespace_name
self.stop_event = stop_event
@ -103,6 +103,7 @@ class PodLogCollector(threading.Thread):
f'Will attempt to read from the beginning, but log '
f'truncation may occur.',
force_flush=True)
finally:
# Instead of time.sleep(), we're waiting on the stop event
# in case it gets set earlier.
self.stop_event.wait(timeout=self.error_backoff_sec)

Loading…
Cancel
Save