Our test clusters are in the broken/dirty state (except the URL map's
"basic" cluster, which isn't affected).
This PR switches to the newly created clusters to:
1. Get a data point on whether newly created clusters are affected
by the same issues.
2. Allow for descriptive work on the old clusters.
3. Hopefully, bring our tests back to green.
Bonus: more sensible cluster names.
* WIP. A seemingly properly failing test
* WIP. Pre-fork handlers now work
* Roughly working.
* Clean up
* Clean up more
* Add to CI
* Format
* Ugh. Remove swap file
* And another
* clean up
* Add copyright
* Format
* Remove another debug line
* Add stub forkable methods
* Remove use of 3.9+ function
* Remove unintentional double copyright
* drfloob review comments
* Only hold lock during Close once
* Create separate job for fork test
* Bump up gdb timeout
* Format
There was a ~1% flake in grpclb end2end tests that was reproducible in opt builds, manifesting as a hang, usually in a the SingleBalancerTest.Fallback test. Through experimentation, I found that by skipping the death test in the grpclb end2end test suite, the hang was no longer reproducible in 10,000 runs. Similarly, moving this test to the end of the suite, or making it run first (as is the case in this PR) resulted in 0 failures in 3000 runs.
It's unclear to me yet why the death test causes things to be unstable in this way. It's clear from the logs that one test does affect the rest, grpc_init is done once for all tests, so all tests utilize the same EventEngine ... until the death test completes, and a new EventEngine is created for the next test.
I think this death test is sufficiently artificial that it's fine to change the test ordering itself, and ignore the wonky intermediate state that results from it.
Reproducing the flake:
```
tools/bazel --bazelrc=tools/remote_build/linux.bazelrc test \
-c opt \
--test_env=GRPC_TRACE=event_engine \
--runs_per_test=5000 \
--test_output=summary \
test/cpp/end2end/grpclb_end2end_test@poller=epoll1
```
* see if experiments can lose weight
* test
* test
* test
* test
* contra test
* contra test
* add explainer
* Automated change: Fix sanity tests
* fixes
* fix
* strict-bs
* comments
* fixes
* iwyu
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This corrects an issue where a developer is unable to decrypt live TLS captures
in Wireshark. Instead the user must take multiple steps to be able to view
decrypted TLS application data, including saving a capture on disk and
reloading the capture.
By appending rather than overwriting the TLS key log file, it is possible
to configure Wireshark TLS (Pre)-Master-Secret log filename before the
capture is started (pointing to and empty or existing file). Then, when a
packet capture is started, decryted data is available immediately. Further,
if the capture is restarted, no keys are lost and no Wireshark reconfiguration
is required.
Appending rather than overwiting TLS key files follows the technique used by
`openssl` binary, Chrome and others. All executables are able to append to the
same keylog file simultaneously, and Wireshark is able to dycrypt all apps
simultaneously.
This PR adds retries on create/get requests from the test driver to the K8s API when 401 Unauthorized error is encountered.
K8S python library expects the ApiClient to be cycled on auth token refreshes.
The problem is described in kubernetes-client/python#741. Currently we don't have any hypotheses why we weren't affected by this problem before.
To force the ApiClient to pick up the new credentials, I shut down the current client, create a new one, and replace api_client properties on all k8s APIs we manage.
This should also work with the Watch-based log collector recovering from an error. To support that, I replace default Configuration so that the next time Watch creates ApiClient implicitly, the Configuration with updated token will be used.
* [EventEngine] Add invalid handle types to the public API
* Automated change: Fix sanity tests
* add definition for static constexpr members
* sanitize
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
* Add wait_for_ready with client timeout example
* Use with to create channel
* Fix sanity tests
* Changes based on comments
* Change thread to daemon thread
* Changes based on comments