Abseil Common Libraries (C++) (grcp 依赖) https://abseil.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

311 lines
11 KiB

Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
// Copyright 2017 The Abseil Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <cstdint>
#include <mutex> // NOLINT(build/c++11)
#include <vector>
#include "absl/base/config.h"
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
#include "absl/base/internal/cycleclock.h"
#include "absl/base/internal/spinlock.h"
#include "absl/synchronization/blocking_counter.h"
#include "absl/synchronization/internal/thread_pool.h"
#include "absl/synchronization/mutex.h"
#include "benchmark/benchmark.h"
namespace {
void BM_Mutex(benchmark::State& state) {
static absl::Mutex* mu = new absl::Mutex;
for (auto _ : state) {
absl::MutexLock lock(mu);
}
}
BENCHMARK(BM_Mutex)->UseRealTime()->Threads(1)->ThreadPerCpu();
static void DelayNs(int64_t ns, int* data) {
int64_t end = absl::base_internal::CycleClock::Now() +
ns * absl::base_internal::CycleClock::Frequency() / 1e9;
while (absl::base_internal::CycleClock::Now() < end) {
++(*data);
benchmark::DoNotOptimize(*data);
}
}
template <typename MutexType>
class RaiiLocker {
public:
explicit RaiiLocker(MutexType* mu) : mu_(mu) { mu_->Lock(); }
~RaiiLocker() { mu_->Unlock(); }
private:
MutexType* mu_;
};
template <>
class RaiiLocker<std::mutex> {
public:
explicit RaiiLocker(std::mutex* mu) : mu_(mu) { mu_->lock(); }
~RaiiLocker() { mu_->unlock(); }
private:
std::mutex* mu_;
};
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
// RAII object to change the Mutex priority of the running thread.
class ScopedThreadMutexPriority {
public:
explicit ScopedThreadMutexPriority(int priority) {
absl::base_internal::ThreadIdentity* identity =
absl::synchronization_internal::GetOrCreateCurrentThreadIdentity();
identity->per_thread_synch.priority = priority;
// Bump next_priority_read_cycles to the infinite future so that the
// implementation doesn't re-read the thread's actual scheduler priority
// and replace our temporary scoped priority.
identity->per_thread_synch.next_priority_read_cycles =
std::numeric_limits<int64_t>::max();
}
~ScopedThreadMutexPriority() {
// Reset the "next priority read time" back to the infinite past so that
// the next time the Mutex implementation wants to know this thread's
// priority, it re-reads it from the OS instead of using our overridden
// priority.
absl::synchronization_internal::GetOrCreateCurrentThreadIdentity()
->per_thread_synch.next_priority_read_cycles =
std::numeric_limits<int64_t>::min();
}
};
void BM_MutexEnqueue(benchmark::State& state) {
// In the "multiple priorities" variant of the benchmark, one of the
// threads runs with Mutex priority 0 while the rest run at elevated priority.
// This benchmarks the performance impact of the presence of a low priority
// waiter when a higher priority waiter adds itself of the queue
// (b/175224064).
//
// NOTE: The actual scheduler priority is not modified in this benchmark:
// all of the threads get CPU slices with the same priority. Only the
// Mutex queueing behavior is modified.
const bool multiple_priorities = state.range(0);
ScopedThreadMutexPriority priority_setter(
(multiple_priorities && state.thread_index() != 0) ? 1 : 0);
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
struct Shared {
absl::Mutex mu;
std::atomic<int> looping_threads{0};
std::atomic<int> blocked_threads{0};
std::atomic<bool> thread_has_mutex{false};
};
static Shared* shared = new Shared;
// Set up 'blocked_threads' to count how many threads are currently blocked
// in Abseil synchronization code.
//
// NOTE: Blocking done within the Google Benchmark library itself (e.g.
// the barrier which synchronizes threads entering and exiting the benchmark
// loop) does _not_ get registered in this counter. This is because Google
// Benchmark uses its own synchronization primitives based on std::mutex, not
// Abseil synchronization primitives. If at some point the benchmark library
// merges into Abseil, this code may break.
absl::synchronization_internal::PerThreadSem::SetThreadBlockedCounter(
&shared->blocked_threads);
// The benchmark framework may run several iterations in the same process,
// reusing the same static-initialized 'shared' object. Given the semantics
// of the members, here, we expect everything to be reset to zero by the
// end of any iteration. Assert that's the case, just to be sure.
ABSL_RAW_CHECK(
shared->looping_threads.load(std::memory_order_relaxed) == 0 &&
shared->blocked_threads.load(std::memory_order_relaxed) == 0 &&
!shared->thread_has_mutex.load(std::memory_order_relaxed),
"Shared state isn't zeroed at start of benchmark iteration");
static constexpr int kBatchSize = 1000;
while (state.KeepRunningBatch(kBatchSize)) {
shared->looping_threads.fetch_add(1);
for (int i = 0; i < kBatchSize; i++) {
{
absl::MutexLock l(&shared->mu);
shared->thread_has_mutex.store(true, std::memory_order_relaxed);
// Spin until all other threads are either out of the benchmark loop
// or blocked on the mutex. This ensures that the mutex queue is kept
// at its maximal length to benchmark the performance of queueing on
// a highly contended mutex.
while (shared->looping_threads.load(std::memory_order_relaxed) -
shared->blocked_threads.load(std::memory_order_relaxed) !=
1) {
}
shared->thread_has_mutex.store(false);
}
// Spin until some other thread has acquired the mutex before we block
// again. This ensures that we always go through the slow (queueing)
// acquisition path rather than reacquiring the mutex we just released.
while (!shared->thread_has_mutex.load(std::memory_order_relaxed) &&
shared->looping_threads.load(std::memory_order_relaxed) > 1) {
}
}
// The benchmark framework uses a barrier to ensure that all of the threads
// complete their benchmark loop together before any of the threads exit
// the loop. So, we need to remove ourselves from the "looping threads"
// counter here before potentially blocking on that barrier. Otherwise,
// another thread spinning above might wait forever for this thread to
// block on the mutex while we in fact are waiting to exit.
shared->looping_threads.fetch_add(-1);
}
absl::synchronization_internal::PerThreadSem::SetThreadBlockedCounter(
nullptr);
}
BENCHMARK(BM_MutexEnqueue)
->Threads(4)
->Threads(64)
->Threads(128)
->Threads(512)
->ArgName("multiple_priorities")
->Arg(false)
->Arg(true);
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
template <typename MutexType>
void BM_Contended(benchmark::State& state) {
int priority = state.thread_index() % state.range(1);
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
ScopedThreadMutexPriority priority_setter(priority);
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
struct Shared {
MutexType mu;
int data = 0;
};
static auto* shared = new Shared;
int local = 0;
for (auto _ : state) {
// Here we model both local work outside of the critical section as well as
// some work inside of the critical section. The idea is to capture some
// more or less realisitic contention levels.
// If contention is too low, the benchmark won't measure anything useful.
// If contention is unrealistically high, the benchmark will favor
// bad mutex implementations that block and otherwise distract threads
// from the mutex and shared state for as much as possible.
// To achieve this amount of local work is multiplied by number of threads
// to keep ratio between local work and critical section approximately
// equal regardless of number of threads.
DelayNs(100 * state.threads(), &local);
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
RaiiLocker<MutexType> locker(&shared->mu);
DelayNs(state.range(0), &shared->data);
}
}
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
void SetupBenchmarkArgs(benchmark::internal::Benchmark* bm,
bool do_test_priorities) {
const int max_num_priorities = do_test_priorities ? 2 : 1;
bm->UseRealTime()
// ThreadPerCpu poorly handles non-power-of-two CPU counts.
->Threads(1)
->Threads(2)
->Threads(4)
->Threads(6)
->Threads(8)
->Threads(12)
->Threads(16)
->Threads(24)
->Threads(32)
->Threads(48)
->Threads(64)
->Threads(96)
->Threads(128)
->Threads(192)
->Threads(256)
->ArgNames({"cs_ns", "num_prios"});
// Some empirically chosen amounts of work in critical section.
// 1 is low contention, 2000 is high contention and few values in between.
for (int critical_section_ns : {1, 20, 50, 200, 2000}) {
for (int num_priorities = 1; num_priorities <= max_num_priorities;
num_priorities++) {
bm->ArgPair(critical_section_ns, num_priorities);
}
}
}
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
BENCHMARK_TEMPLATE(BM_Contended, absl::Mutex)
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
->Apply([](benchmark::internal::Benchmark* bm) {
SetupBenchmarkArgs(bm, /*do_test_priorities=*/true);
});
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
BENCHMARK_TEMPLATE(BM_Contended, absl::base_internal::SpinLock)
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
->Apply([](benchmark::internal::Benchmark* bm) {
SetupBenchmarkArgs(bm, /*do_test_priorities=*/false);
});
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
BENCHMARK_TEMPLATE(BM_Contended, std::mutex)
Export of internal Abseil changes -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>: Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>: Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>: Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
4 years ago
->Apply([](benchmark::internal::Benchmark* bm) {
SetupBenchmarkArgs(bm, /*do_test_priorities=*/false);
});
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
// Measure the overhead of conditions on mutex release (when they must be
// evaluated). Mutex has (some) support for equivalence classes allowing
// Conditions with the same function/argument to potentially not be multiply
// evaluated.
//
// num_classes==0 is used for the special case of every waiter being distinct.
void BM_ConditionWaiters(benchmark::State& state) {
int num_classes = state.range(0);
int num_waiters = state.range(1);
struct Helper {
static void Waiter(absl::BlockingCounter* init, absl::Mutex* m, int* p) {
init->DecrementCount();
m->LockWhen(absl::Condition(
static_cast<bool (*)(int*)>([](int* v) { return *v == 0; }), p));
m->Unlock();
}
};
if (num_classes == 0) {
// No equivalence classes.
num_classes = num_waiters;
}
absl::BlockingCounter init(num_waiters);
absl::Mutex mu;
std::vector<int> equivalence_classes(num_classes, 1);
// Must be declared last to be destroyed first.
absl::synchronization_internal::ThreadPool pool(num_waiters);
for (int i = 0; i < num_waiters; i++) {
// Mutex considers Conditions with the same function and argument
// to be equivalent.
pool.Schedule([&, i] {
Helper::Waiter(&init, &mu, &equivalence_classes[i % num_classes]);
});
}
init.Wait();
for (auto _ : state) {
mu.Lock();
mu.Unlock(); // Each unlock requires Condition evaluation for our waiters.
}
mu.Lock();
for (int i = 0; i < num_classes; i++) {
equivalence_classes[i] = 0;
}
mu.Unlock();
}
// Some configurations have higher thread limits than others.
#if defined(__linux__) && !defined(ABSL_HAVE_THREAD_SANITIZER)
Export of internal Abseil changes -- f012012ef78234a6a4585321b67d7b7c92ebc266 by Laramie Leavitt <lar@google.com>: Slight restructuring of absl/random/internal randen implementation. Convert round-keys.inc into randen_round_keys.cc file. Consistently use a 128-bit pointer type for internal method parameters. This allows simpler pointer arithmetic in C++ & permits removal of some constants and casts. Remove some redundancy in comments & constexpr variables. Specifically, all references to Randen algorithm parameters use RandenTraits; duplication in RandenSlow removed. PiperOrigin-RevId: 312190313 -- dc8b42e054046741e9ed65335bfdface997c6063 by Abseil Team <absl-team@google.com>: Internal change. PiperOrigin-RevId: 312167304 -- f13d248fafaf206492c1362c3574031aea3abaf7 by Matthew Brown <matthewbr@google.com>: Cleanup StrFormat extensions a little. PiperOrigin-RevId: 312166336 -- 9d9117589667afe2332bb7ad42bc967ca7c54502 by Derek Mauro <dmauro@google.com>: Internal change PiperOrigin-RevId: 312105213 -- 9a12b9b3aa0e59b8ee6cf9408ed0029045543a9b by Abseil Team <absl-team@google.com>: Complete IGNORE_TYPE macro renaming. PiperOrigin-RevId: 311999699 -- 64756f20d61021d999bd0d4c15e9ad3857382f57 by Gennadiy Rozental <rogeeff@google.com>: Switch to fixed bytes specific default value. This fixes the Abseil Flags for big endian platforms. PiperOrigin-RevId: 311844448 -- bdbe6b5b29791dbc3816ada1828458b3010ff1e9 by Laramie Leavitt <lar@google.com>: Change many distribution tests to use pcg_engine as a deterministic source of entropy. It's reasonable to test that the BitGen itself has good entropy, however when testing the cross product of all random distributions x all the architecture variations x all submitted changes results in a large number of tests. In order to account for these failures while still using good entropy requires that our allowed sigma need to account for all of these independent tests. Our current sigma values are too restrictive, and we see a lot of failures, so we have to either relax the sigma values or convert some of the statistical tests to use deterministic values. This changelist does the latter. PiperOrigin-RevId: 311840096 GitOrigin-RevId: f012012ef78234a6a4585321b67d7b7c92ebc266 Change-Id: Ic84886f38ff30d7d72c126e9b63c9a61eb729a1a
5 years ago
constexpr int kMaxConditionWaiters = 8192;
#else
constexpr int kMaxConditionWaiters = 1024;
#endif
BENCHMARK(BM_ConditionWaiters)->RangePair(0, 2, 1, kMaxConditionWaiters);
} // namespace