There was a ~1% flake in grpclb end2end tests that was reproducible in opt builds, manifesting as a hang, usually in a the SingleBalancerTest.Fallback test. Through experimentation, I found that by skipping the death test in the grpclb end2end test suite, the hang was no longer reproducible in 10,000 runs. Similarly, moving this test to the end of the suite, or making it run first (as is the case in this PR) resulted in 0 failures in 3000 runs.
It's unclear to me yet why the death test causes things to be unstable in this way. It's clear from the logs that one test does affect the rest, grpc_init is done once for all tests, so all tests utilize the same EventEngine ... until the death test completes, and a new EventEngine is created for the next test.
I think this death test is sufficiently artificial that it's fine to change the test ordering itself, and ignore the wonky intermediate state that results from it.
Reproducing the flake:
```
tools/bazel --bazelrc=tools/remote_build/linux.bazelrc test \
-c opt \
--test_env=GRPC_TRACE=event_engine \
--runs_per_test=5000 \
--test_output=summary \
test/cpp/end2end/grpclb_end2end_test@poller=epoll1
```
* WRR: port StaticStrideScheduler to OSS
* WIP
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
* add OOB reporting
* generate_projects
* clang-format
* add config parser test
* clang-tidy and minimize lock contention
* add config defaults
* add oob_reporting_period config field and add basic test
* Automated change: Fix sanity tests
* fix test
* change test to use basic RR
* WIP: started exposing peer address to LB policy API
* first WRR test passing!
* small cleanup
* port RR fix to WRR
* test helper refactoring
* more test helper refactoring
* WIP: trying to fix test to have the right weights
* more WIP -- need to make pickers DualRefCounted
* fix timer ref handling and get tests working
* clang-format
* iwyu and generate_projects
* fix build
* add test for OOB reporting
* keep only READY subchannels in the picker
* add file missed in a previous commit
* fix sanity
* iwyu
* add weight expiration period
* add tests for weight update period and OOB reporting period
* Automated change: Fix sanity tests
* lower bound for timer interval
* consistently apply grpc_test_slowdown_factor()
* cache time in test
* add blackout_period tests
* avoid some unnecessary copies
* clang-format
* add field to config test
* simplify orca watcher tracking
* attempt to fix build
* iwyu
* generate_projects
* update xds proto dependency
* add xDS LB policy entry to registry
* add "_experimental" suffix to policy name
* update LB policy name and remove debug log
* add env var protection
* generate_projects
* gen_upb_api
* WRR: update tests to cover qps plumbing
* WIP
* Automated change: Fix sanity tests
* more WIP
* basic WRR e2e test working
* add OOB test
* add xDS WRR e2e test
* clang-format
* fix sanity
* ignore duplicate addresses
* Automated change: Fix sanity tests
* add new tracer to doc/environment_variables.md
* retain scheduler state across pickers
* Automated change: Fix sanity tests
* use separate mutexes for scheduler and timer
* sort addresses to avoid index churn
* remove fetch_sub for wrap around in RR case
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* WRR: port StaticStrideScheduler to OSS
* WIP
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
* add OOB reporting
* generate_projects
* clang-format
* add config parser test
* clang-tidy and minimize lock contention
* add config defaults
* add oob_reporting_period config field and add basic test
* Automated change: Fix sanity tests
* fix test
* change test to use basic RR
* WIP: started exposing peer address to LB policy API
* first WRR test passing!
* small cleanup
* port RR fix to WRR
* test helper refactoring
* more test helper refactoring
* WIP: trying to fix test to have the right weights
* more WIP -- need to make pickers DualRefCounted
* fix timer ref handling and get tests working
* clang-format
* iwyu and generate_projects
* fix build
* add test for OOB reporting
* keep only READY subchannels in the picker
* add file missed in a previous commit
* fix sanity
* iwyu
* add weight expiration period
* add tests for weight update period and OOB reporting period
* Automated change: Fix sanity tests
* lower bound for timer interval
* consistently apply grpc_test_slowdown_factor()
* cache time in test
* add blackout_period tests
* avoid some unnecessary copies
* clang-format
* add field to config test
* simplify orca watcher tracking
* attempt to fix build
* iwyu
* generate_projects
* add "_experimental" suffix to policy name
* WRR: update tests to cover qps plumbing
* WIP
* more WIP
* basic WRR e2e test working
* add OOB test
* fix sanity
* ignore duplicate addresses
* Automated change: Fix sanity tests
* add new tracer to doc/environment_variables.md
* retain scheduler state across pickers
* Automated change: Fix sanity tests
* use separate mutexes for scheduler and timer
* sort addresses to avoid index churn
* remove fetch_sub for wrap around in RR case
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* Gcp Observability: Lazily initialize channels post-init
* IWYU and fix build deps
* Run RegistryPostInit for client census filters too
* Remove unused function
Updates the ProtoReflectionDescriptorDatabase ctor to take a reference
to std::shared_ptr<grpc::ChannelInterface> rather than
std::shared_ptr<grpc::Channel>. This helps code that is making use of
grpc::ChannelInterface from having to perform a cast from the Base to
the Derived when creating the refelction db.
* initial
* Intermediate
* Another try
* Try multiple necesary pulls
* Filter works other than client half close
* Fixes
* Add a cancelled RPC test
* Handle trailer only responses
* Tests for disabled logging and truncated payloads
* Fix authority and peer
* Add TODOs and asserts for half-close
* Fix tests for half-close and cancel
* 2d748fcb1cf45cac62729b8346ad15e6abc79e97
* Fix sanity checks
* Strict bazel build
* Fix package
* IWYU
* Fix cmake
* Explicit cast to string
* Size casts
* Fix Arena leak and disable macos build for now
* Reviewer comments
* Tracing: Add annotations for when call is removed from resolver result queue and lb pick queue
* Add test for pending resolver result queue as well
* Update annotation messages
* Revert "Revert "xDS stateful session affinity: add config plumbing (#31827)" (#31873)"
This reverts commit 4f15d3dcf9.
* fix build for compilers too dumb to recognize the full set of enum values