* Initial skeleton for outlier detection
* fixing code review comments (modifying child policy)
* Skeleton and all tests passing except for 1
* small code review comments fix
* Adding the parsing of the policy in cds and put it in discovery
mechansim json format
* Parsing outlier detection json policy from parent
* Adding parsing of the updates
* Added Subchannel wrapper and watcher wrapper: and all states pass
through and all tests still pass
* added framework to do eject and uneject
* fixing code review comments
* restore a test
* fixing code review comments
* taking care of code review comments
* removing debug code and rebuild build files
* fixing according to code review comments
* fixing code review comments
* Adding address to subchannel map
* addressing code review comments
* adding call counter
* Refcount SubchannelState (in the map) and store them in Subcahnnel Wrapper
* fixing counterss
* Call counter and tracker skleton added
* Call counter
* addressing code review comments
* addressing code review comments
* Added CallCounter and timer
* fixing sanity; but more importantly: taking out timer temporarly as it
was causing test failures.
* sanity
* fixing according to code review comments
* addressing code review comments
* all algorithms implemented
* addressing code review comment about starting the timer
* protect private vars
* small fix
* Added one more corner case
* fixing EjectionTimer
* Fixing according to code review suggestions.
* fixing according to code reveiw comments
* taking care of code review comments
* fixing sanity issues
* Adding proto to tests
* First test
* Fixing according to code review comments
* Tests all working now
* fixing a crash
* fixing build files
* fixing sanity
* sanity
* Simplifying tests
* merge and update
* format
* sanity and format
* Fixing asan error
* fixing parsing logic and error handling
* 6 more tests done
* Added verifying unejection to tests
* Added all the tests
* fixing according to code review comments
* fixing asan and ubsan
* Fixing tests according to code review comments
* Added both algorithm tests
* added percentage enforcement tests
* fixing tsan error
* keeping debug, but fix warning
* remove debugs
* fixing IWYU and build errors after
* test comments change only but very important
* fixing code review comments
* one more refactorying of util function
* Removed debugs and added one more helper method
* one more logic fix
* Fixing last bit of code review comments and added disable tests
* fixing code review comments
* fixing IWYU
* sanity format
* protecting the feature with environment var:
registering policy and generating policy
* added a todo according to code review comments
* fixing a clang finding at import time
* build fix after synching to latest
* subchannel: report IDLE upon existing connection failure and after backoff interval
* rename AttemptToConnect() to RequestConnection()
* clang-format
* fix unused parameter warning
* fix subchannel to handle either TF or SHUTDOWN from transport
* fix handling of ConnectedSubchannel failure
* pass status up in IDLE state to communicate keepalive info
* update comment
* split pick_first and round_robin tests into their own test suites
* improve log message
* add test
* clang-format
* appease clang-tidy
* fix test to do a poor man's graceful shutdown to avoid spurious RPC failures
* simplify round_robin logic and fix test flakes
* fix grpclb bug
* WIP: add OOB backend metric API for LB policies
* fix some includes
* minor fixes
* picking this up again...
* more WIP
* health checking: cancel stream if response message fails to parse
* basic structure in place, but still have synchronization issues to address
* ORCA: implement ORCA RPC service for OOB backend metric reporting
* fix unused parameter error
* gen_upb_api
* add missing build deps
* increase test timing fudge factor
* add missing copyright header
* fix build and locking problems
* clang-format
* document API
* buildifier
* add test, but doesn't build yet
* new test working, but broke existing test, and need to fix server API
* don't register as a generic service
* update test for new orca service registration API
* fix build
* sanitize
* report interval defaults to min interval
* add channel trace event on UNIMPLEMENTED
* don't regenerate the response proto unless something changed
* add missing build dep
* fix comment
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* remove unnecessary deps
* move aggregate and logical_dns cluster tests to their own file
* split aggregate and logical_dns tests into separate suites
* clang-format
* re-add flaky tag
* clang-tidy and remove unnecessary dep
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* move ring_hash tests to their own file
* generate_projects
* remove unnecessary deps
* re-add flaky tag
* clang-format
* refactor connection delay injection from client_lb_end2end_test
* fix build
* fix build on older compilers
* clang-format
* buildifier
* a bit of code cleanup
* start failover time whenever the child reports CONNECTING, and don't cancel when deactivating
* clang-format
* rewrite test
* simplify logic in priority policy
* clang-format
* switch to using a bit to indicate child healthiness
* fix reversed comment
* more changes in priority and ring_hash.
priority:
- go back to starting failover timer upon CONNECTING, but only if seen
READY or IDLE more recently than TRANSIENT_FAILURE
ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
subchannel connecting, just like we do after a second subchannel
reports TF, to ensure that we don't stay in CONNECTING indefinitely if
we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
creation if needed
* ring_hash: fix connectivity state seen by aggregation and picker
* fix obiwan error
* swap the order of ring_hash aggregation rules 3 and 4
* restore original test
* refactor connection injector QueuedAttempt code
* add test showing that ring_hash will continue connecting without picks
* clang-format
* don't actually need seen_failure_since_ready_ anymore
* fix TSAN problem
* address code review comments
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* move RLS tests to their own file
* remove unnecessary deps
* generate_projects
* Fixes a flake with the LoadReporter end2end test.
I *believe* the test is wrong, based on the .proto description of the
LoadReporter.
The protocol described in src/proto/grpc/lb/v1/load_reporter.proto has
the ReportLoad rpc returns a stream of LoadReportResponse, which itself
has a repeated field of Load messages. The comment before it states:
"It is not strictly necessary to aggregate all entries into one entry
per <tag, user_id> tuple, although it is preferred to do so."
Debugging the issue shows we are in fact properly getting all 3 expected
load report types, just in two separate messages instead of a single
one.
This new test codepath will coalesce the load report responses, and also
addresses the fact the original test wasn't verifying that we were
getting the 3 expected types.
* Automated change: Fix sanity tests
* Renaming variables.
* ASSERT_ -> EXPECT_
* Automated change: Fix sanity tests
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* move fault injection tests to their own file
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* generate_projects
* appease clang-tidy
* move AdminServerThread to CSDS test suite
* remove unnecessary deps
* generate_projects
* don't mark test as flaky
* Maybe fix for PUT deprecation
* Guard PUT request accepting with a flag and add tests
* Reviewer comments
* Add fallthrough notation
* Reviewer comments
Co-authored-by: Craig Tiller <ctiller@google.com>
* Revert "Revert "ORCA: implement ORCA RPC service for OOB backend metric reporting (#29215)" (#29351)"
This reverts commit 71b355624f.
* move ORCA service to its own BUILD rule
* ORCA: implement ORCA RPC service for OOB backend metric reporting
* fix unused parameter error
* gen_upb_api
* add missing build deps
* increase test timing fudge factor
* add missing copyright header
* buildifier
* don't register as a generic service
* report interval defaults to min interval
* don't regenerate the response proto unless something changed
* use INTERNAL for proto parsing failure
* use absl::Duration in public API
* Adding is_optional case to RLS
* integrated with the updated envoy data-plane
* Fixing an old bug and adding test
* Use the same plugin map for ignore
* Remove ignore set
* Fixed another test.
* addressing code review comments.
* clean up!