* Reland x2: Make GetDefaultEventEngine return a shared_ptr
* remove thread leak from NativeDNSResolver
This is not going to work for resolvers that support cancellation.
* give resolvers bounded lifetimes
Some resolver own EventEngines. EventEngines cannot run off the end of
the process since they have unjoined threads (problematic in a small set
of environments). This gives resolvers bounded lifetimes, and allows
replacement of resolvers without ASAN issues of deleting resolvers in
active use (occurs in tests).
* fix
* fix windows
* fix surface init test
* fix
* sanitize
* use after move
* the test must wait for the callback to be destroyed
* windows fix: delete the resolver on iomgr shutdown, not before
* Make TimerManager threads non-joinable
On gRPC shutdown, any unjoined TimerManager threads will cause TSAN to
detect thread leaks. This fix resolves issues I saw in end2end test
shutdown in another PR, where a single timer manager thread was always
alive after the test ended.
The long-term solution is to integrate the new ThreadPool here, but this
unblocks me for now.
* backport fix
* fix
* shared_ptr<EventEngine> in EventEngine benchmarks
* WorkQueue
* weaken the large obj stress test for Windows; documentation
* update comment
* Add WorkQueue microbenchmark. Results below ...
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
BM_WorkQueueIntptrPopFront/1 297 ns 297 ns 2343500 items_per_second=3.3679M/s
BM_WorkQueueIntptrPopFront/8 7022 ns 7020 ns 99356 items_per_second=1.13956M/s
BM_WorkQueueIntptrPopFront/64 59606 ns 59590 ns 11770 items_per_second=1074k/s
BM_WorkQueueIntptrPopFront/512 477867 ns 477748 ns 1469 items_per_second=1071.7k/s
BM_WorkQueueIntptrPopFront/4096 3815786 ns 3814925 ns 184 items_per_second=1073.68k/s
I0902 19:05:22.138022069 12 test_config.cc:194] TestEnvironment ends
================================================================================
* use int64_t for times. 0 performance change
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
BM_WorkQueueIntptrPopFront/1 277 ns 277 ns 2450292 items_per_second=3.60967M/s
BM_WorkQueueIntptrPopFront/8 6718 ns 6716 ns 105497 items_per_second=1.19126M/s
BM_WorkQueueIntptrPopFront/64 56428 ns 56401 ns 12268 items_per_second=1.13474M/s
BM_WorkQueueIntptrPopFront/512 458953 ns 458817 ns 1550 items_per_second=1.11591M/s
BM_WorkQueueIntptrPopFront/4096 3686357 ns 3685120 ns 191 items_per_second=1.1115M/s
I0902 19:25:31.549382949 12 test_config.cc:194] TestEnvironment ends
================================================================================
* add PopBack tests: same performance profile exactly
* use Mutex instead of Spinlock
It's safer, and so far equally performant in benchmarks of opt builds
* add deque test for comparison. It is faster on all tests.
* Add sparsely-populated multi-threaded benchmarks.
* fix
* fix
* refactor to help thread safety analysis
* Specialize WorkQueue for Closure*s and AnyInvocables
* remove unused callback storage
* add single-threaded benchmark for closure vs invocable
* sanitize
* missing include
* move bm_work_queue to microbenchmarks so it isn't exported
* s/workqueue/work_queue/g
* use nullptr instead of optionals for popped closures
* reviewer test suggestion
* private things are private
* add a work_queue fuzzer
Ran for 10 minutes @ 42 jobs @ 42 workers. Zero failures.
Checked in a selection of 100 good seeds after merging the thousands of
results.
* fix
* fix header guards
* nuke the corpora
* feedback
* sanitize
* Timestamp::Now
* fix
* fuzzers do not work on windows
* windows does not like multithreaded benchmark tests
This adds a unit test for XdsClient and fixes several watcher-notification bugs found in the process. Specifically:
- When an ADS stream fails or an xDS channel reports a connectivity failure, report an error only to the watchers for resources being subscribed to on that particular channel, not to watchers on other channels.
- Cache the error status for the channel, so that if a new watcher is started after the channel reports the error, we can immediately report that error to the new watcher.
- If a resource is NACKed and has not been previously cached, or does not exist, report that fact to any new watcher that may be started later.
- If a resource in an ADS response is unparseable but is wrapped in a `Resource` wrapper, we do know its name, so record the validation failure in the cache and report it to the watchers.
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
* client_channel: rewrite illegal status codes from control plane
* rewrite illegal status codes for call creds
* move fail_lb policy out of retry_lb_fail test so it can be reused
* test resolver and LB policy status rewrites
* add test for ConfigSelector status rewriting
* attempt to add client_auth filter unit test
* fix client_auth_filter test
* cleanup test
* fix build
* fix some memory leaks
* Automated change: Fix sanity tests
* Update client_auth_filter_test.cc
* fix build
* code review comments
* clang-tidy
Co-authored-by: markdroth <markdroth@users.noreply.github.com>
Co-authored-by: Craig Tiller <ctiller@google.com>
* [cleanup] Remove profiling timers
- nobody has used this system in years
- if we needed it, we'd probably rewrite it at this point to be something more modern
- let's remove it until that need arises
* fix
* fixes
* Disable end2end_binder_transport_test on some platforms
The following test case is flaky on windows
End2EndBinderTransportTestWithDifferentDelayTimes/End2EndBinderTransportTest.UnaryCallServerTimeout/1,
where GetParam() = 10ns
Binder transport won't be run on platform other than Android so it
should be OK to disable the test on some platform.
* Regenerate projects.
This works around valgrind memory leaks by giving EventEngines a fixed
lifetime. We eventually want ref-counted EventEngines internally, so this is
a step in the right direction as well.
A (currently) pthread_atfork-based fork support mechanism, allowing EventEngines - or any other object that wants to implement the Forkable interface - respond to forks.
This is a partial fork of the windows iomgr code - specifically the IOCP and Socket pieces - with some improved architecture and encapsulation. And the start of a WindowsEventEngine.
Once this code is used in a gRPC TCP context, I imagine a few issues will shake out. Also, getting sanitizers set up with MSVC will take a bit of work (see a commit referencing abseil and MSVC bugs to hack around).
I forked the IomgrEventEngine's posix poller interfaces in the hope of negotiating compatibility between the platforms, but the interfaces diverged a fair bit, and I'm doubtful we'll be able to use these "pollers" generically in the same TCP code. Reunification might not happen, and that's probably fine, we'll see how similar the TCP code looks once it's fleshed out.
I also extracted the IomgrEventEngine's timer piece into a separate component, usable by both engines.