* Reland x2: Make GetDefaultEventEngine return a shared_ptr
* remove thread leak from NativeDNSResolver
This is not going to work for resolvers that support cancellation.
* give resolvers bounded lifetimes
Some resolver own EventEngines. EventEngines cannot run off the end of
the process since they have unjoined threads (problematic in a small set
of environments). This gives resolvers bounded lifetimes, and allows
replacement of resolvers without ASAN issues of deleting resolvers in
active use (occurs in tests).
* fix
* fix windows
* fix surface init test
* fix
* sanitize
* use after move
* the test must wait for the callback to be destroyed
* windows fix: delete the resolver on iomgr shutdown, not before
* Make TimerManager threads non-joinable
On gRPC shutdown, any unjoined TimerManager threads will cause TSAN to
detect thread leaks. This fix resolves issues I saw in end2end test
shutdown in another PR, where a single timer manager thread was always
alive after the test ended.
The long-term solution is to integrate the new ThreadPool here, but this
unblocks me for now.
* backport fix
* fix
* shared_ptr<EventEngine> in EventEngine benchmarks
* WorkQueue
* weaken the large obj stress test for Windows; documentation
* update comment
* Add WorkQueue microbenchmark. Results below ...
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
BM_WorkQueueIntptrPopFront/1 297 ns 297 ns 2343500 items_per_second=3.3679M/s
BM_WorkQueueIntptrPopFront/8 7022 ns 7020 ns 99356 items_per_second=1.13956M/s
BM_WorkQueueIntptrPopFront/64 59606 ns 59590 ns 11770 items_per_second=1074k/s
BM_WorkQueueIntptrPopFront/512 477867 ns 477748 ns 1469 items_per_second=1071.7k/s
BM_WorkQueueIntptrPopFront/4096 3815786 ns 3814925 ns 184 items_per_second=1073.68k/s
I0902 19:05:22.138022069 12 test_config.cc:194] TestEnvironment ends
================================================================================
* use int64_t for times. 0 performance change
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
BM_WorkQueueIntptrPopFront/1 277 ns 277 ns 2450292 items_per_second=3.60967M/s
BM_WorkQueueIntptrPopFront/8 6718 ns 6716 ns 105497 items_per_second=1.19126M/s
BM_WorkQueueIntptrPopFront/64 56428 ns 56401 ns 12268 items_per_second=1.13474M/s
BM_WorkQueueIntptrPopFront/512 458953 ns 458817 ns 1550 items_per_second=1.11591M/s
BM_WorkQueueIntptrPopFront/4096 3686357 ns 3685120 ns 191 items_per_second=1.1115M/s
I0902 19:25:31.549382949 12 test_config.cc:194] TestEnvironment ends
================================================================================
* add PopBack tests: same performance profile exactly
* use Mutex instead of Spinlock
It's safer, and so far equally performant in benchmarks of opt builds
* add deque test for comparison. It is faster on all tests.
* Add sparsely-populated multi-threaded benchmarks.
* fix
* fix
* refactor to help thread safety analysis
* Specialize WorkQueue for Closure*s and AnyInvocables
* remove unused callback storage
* add single-threaded benchmark for closure vs invocable
* sanitize
* missing include
* move bm_work_queue to microbenchmarks so it isn't exported
* s/workqueue/work_queue/g
* use nullptr instead of optionals for popped closures
* reviewer test suggestion
* private things are private
* add a work_queue fuzzer
Ran for 10 minutes @ 42 jobs @ 42 workers. Zero failures.
Checked in a selection of 100 good seeds after merging the thousands of
results.
* fix
* fix header guards
* nuke the corpora
* feedback
* sanitize
* Timestamp::Now
* fix
* fuzzers do not work on windows
* windows does not like multithreaded benchmark tests
* Revert "Revert "[event_engine] Thread pool that can handle deletion in a callback (#30763)" (#30972)"
This reverts commit ccc787a020.
* Update thread_pool.cc
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
* [fixit] Scale down large tests
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
* mark up cpu usage
* Automated change: Fix sanity tests
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We have many tests that create 100 threads or more, and mounting evidence that
this is harmful to our CI environment.
When the original code for many of these tests was written we ran our tests
under run_tests, which had explicit handling for tracking the number of threads
each test needed and making sure that we weren't over subscribing the test
runner. Bazel has no such facility (and the facility in run_tests has since
been removed) and so we need to adjust.
This PR adjusts down a single test and is part of a series so that we can
review and roll back easily if required.
absl::Now() is not guaranteed to be monotonic, and we are seeing some
potential time travel at different rates on different machines. I've
replaced the most sensitive pieces with std::chrono::steady_clock to see
if that fixes the flakes we are seeing.
* Reland: "Make GetDefaultEventEngine return a shared_ptr (#30280)"
This reverts commit 45959e7cc1.
* Attempted fix with NoDestruct
* Not a process-wide singleton for the type. Just a NonDestruct
* fix
This works around valgrind memory leaks by giving EventEngines a fixed
lifetime. We eventually want ref-counted EventEngines internally, so this is
a step in the right direction as well.
A (currently) pthread_atfork-based fork support mechanism, allowing EventEngines - or any other object that wants to implement the Forkable interface - respond to forks.
* Rename the default EventEngine headers
Small cleanup. This code hasn't been related to factories for a month or
two.
* ensure only one target contains default_event_engine.h
* src + hdr in same target
* include guards