* Initial skeleton for outlier detection
* fixing code review comments (modifying child policy)
* Skeleton and all tests passing except for 1
* small code review comments fix
* Adding the parsing of the policy in cds and put it in discovery
mechansim json format
* Parsing outlier detection json policy from parent
* Adding parsing of the updates
* Added Subchannel wrapper and watcher wrapper: and all states pass
through and all tests still pass
* added framework to do eject and uneject
* fixing code review comments
* restore a test
* fixing code review comments
* taking care of code review comments
* removing debug code and rebuild build files
* fixing according to code review comments
* fixing code review comments
* Adding address to subchannel map
* addressing code review comments
* adding call counter
* Refcount SubchannelState (in the map) and store them in Subcahnnel Wrapper
* fixing counterss
* Call counter and tracker skleton added
* Call counter
* addressing code review comments
* addressing code review comments
* Added CallCounter and timer
* fixing sanity; but more importantly: taking out timer temporarly as it
was causing test failures.
* sanity
* fixing according to code review comments
* addressing code review comments
* all algorithms implemented
* addressing code review comment about starting the timer
* protect private vars
* small fix
* Added one more corner case
* fixing EjectionTimer
* Fixing according to code review suggestions.
* fixing according to code reveiw comments
* taking care of code review comments
* fixing sanity issues
* Adding proto to tests
* First test
* Fixing according to code review comments
* Tests all working now
* fixing a crash
* fixing build files
* fixing sanity
* sanity
* Simplifying tests
* merge and update
* format
* sanity and format
* Fixing asan error
* fixing parsing logic and error handling
* 6 more tests done
* Added verifying unejection to tests
* Added all the tests
* fixing according to code review comments
* fixing asan and ubsan
* Fixing tests according to code review comments
* Added both algorithm tests
* added percentage enforcement tests
* fixing tsan error
* keeping debug, but fix warning
* remove debugs
* fixing IWYU and build errors after
* test comments change only but very important
* fixing code review comments
* one more refactorying of util function
* Removed debugs and added one more helper method
* one more logic fix
* Fixing last bit of code review comments and added disable tests
* fixing code review comments
* fixing IWYU
* sanity format
* protecting the feature with environment var:
registering policy and generating policy
* added a todo according to code review comments
* fixing a clang finding at import time
* build fix after synching to latest
* Eliminate post-init in channel stack builder
We've had a post init function on channel stack builder for a very long
time, an it serves to run some code after initialization completes.
We need the functionality for a few things, but the function passed in
is intimately tied to the filter in use - we never vary it between
multiple functions for the same filter... which means it makes more
sense to locate this functionality as part of the filter interface.
* fix
* Automated change: Fix sanity tests
* fix
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* adding a min progress size argument to grpc_endpoint_read
* fix missing argument error
* adding a static_cast
* reverting changes in tcp_posix.cc
* add missing changes to CFStreamEndpointTests.mm
* subchannel: report IDLE upon existing connection failure and after backoff interval
* rename AttemptToConnect() to RequestConnection()
* clang-format
* fix unused parameter warning
* fix subchannel to handle either TF or SHUTDOWN from transport
* fix handling of ConnectedSubchannel failure
* pass status up in IDLE state to communicate keepalive info
* update comment
* split pick_first and round_robin tests into their own test suites
* improve log message
* add test
* clang-format
* appease clang-tidy
* fix test to do a poor man's graceful shutdown to avoid spurious RPC failures
* simplify round_robin logic and fix test flakes
* fix grpclb bug
* WIP: add OOB backend metric API for LB policies
* fix some includes
* minor fixes
* picking this up again...
* more WIP
* health checking: cancel stream if response message fails to parse
* basic structure in place, but still have synchronization issues to address
* ORCA: implement ORCA RPC service for OOB backend metric reporting
* fix unused parameter error
* gen_upb_api
* add missing build deps
* increase test timing fudge factor
* add missing copyright header
* fix build and locking problems
* clang-format
* document API
* buildifier
* add test, but doesn't build yet
* new test working, but broke existing test, and need to fix server API
* don't register as a generic service
* update test for new orca service registration API
* fix build
* sanitize
* report interval defaults to min interval
* add channel trace event on UNIMPLEMENTED
* don't regenerate the response proto unless something changed
* add missing build dep
* fix comment
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* remove unnecessary deps
* move aggregate and logical_dns cluster tests to their own file
* split aggregate and logical_dns tests into separate suites
* clang-format
* re-add flaky tag
* clang-tidy and remove unnecessary dep
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* move ring_hash tests to their own file
* generate_projects
* remove unnecessary deps
* re-add flaky tag
* clang-format
* Support unix socket in grpc_sockaddr_to_string
* make it return statusor
* clang fix
* made grpc_sockaddr_to_string() to return statusor
* Let Chttp2ServerListener::Start crash
* test failure fixed
* api_fuzzer fixed
* comments addressed.
* more comments addressed
* comments addressed
* fix other broken builds
* refactor connection delay injection from client_lb_end2end_test
* fix build
* fix build on older compilers
* clang-format
* buildifier
* a bit of code cleanup
* start failover time whenever the child reports CONNECTING, and don't cancel when deactivating
* clang-format
* rewrite test
* simplify logic in priority policy
* clang-format
* switch to using a bit to indicate child healthiness
* fix reversed comment
* more changes in priority and ring_hash.
priority:
- go back to starting failover timer upon CONNECTING, but only if seen
READY or IDLE more recently than TRANSIENT_FAILURE
ring_hash:
- don't flap back and forth between IDLE and CONNECTING; once we go
CONNECTING, we stay there until either TF or READY
- after the first subchannel goes TF, we proactively start another
subchannel connecting, just like we do after a second subchannel
reports TF, to ensure that we don't stay in CONNECTING indefinitely if
we aren't getting any new picks
- always return ring hash's picker, regardless of connectivity state
- update the subchannel connectivity state seen by the picker upon
subchannel list creation
- start proactive subchannel connection attempt upon subchannel list
creation if needed
* ring_hash: fix connectivity state seen by aggregation and picker
* fix obiwan error
* swap the order of ring_hash aggregation rules 3 and 4
* restore original test
* refactor connection injector QueuedAttempt code
* add test showing that ring_hash will continue connecting without picks
* clang-format
* don't actually need seen_failure_since_ready_ anymore
* fix TSAN problem
* address code review comments
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* move AdminServerThread to CSDS test suite
* move RLS tests to their own file
* remove unnecessary deps
* generate_projects
* Fixes a flake with the LoadReporter end2end test.
I *believe* the test is wrong, based on the .proto description of the
LoadReporter.
The protocol described in src/proto/grpc/lb/v1/load_reporter.proto has
the ReportLoad rpc returns a stream of LoadReportResponse, which itself
has a repeated field of Load messages. The comment before it states:
"It is not strictly necessary to aggregate all entries into one entry
per <tag, user_id> tuple, although it is preferred to do so."
Debugging the issue shows we are in fact properly getting all 3 expected
load report types, just in two separate messages instead of a single
one.
This new test codepath will coalesce the load report responses, and also
addresses the fact the original test wasn't verifying that we were
getting the 3 expected types.
* Automated change: Fix sanity tests
* Renaming variables.
* ASSERT_ -> EXPECT_
* Automated change: Fix sanity tests
* move some code around
* remove num_backends parameter from XdsEnd2endTest
* remove use_xds_enabled_server param from XdsEnd2endTest
* remove xds_resource_does_not_exist_timeout_ms param from XdsEnd2endTest
* remove client_load_reporting_interval_seconds param from XdsEnd2endTest
* start moving CreateAndStartBackends() into individual tests
* finish moving CreateAndStartBackends() into individual tests
* remove unused variable
* remove SetEdsResourceWithDelay
* fix test flake
* clang-tidy
* clang-format
* move test framework to its own library
* fix build
* clang-format
* fix windows build
* move fault injection tests to their own file
* rename TestType to XdsTestType
* move BackendServiceImpl inside of BackendServerThread
* clang-format
* generate_projects
* appease clang-tidy
* move AdminServerThread to CSDS test suite
* remove unnecessary deps
* generate_projects
* don't mark test as flaky
* Maybe fix for PUT deprecation
* Guard PUT request accepting with a flag and add tests
* Reviewer comments
* Add fallthrough notation
* Reviewer comments
Co-authored-by: Craig Tiller <ctiller@google.com>