The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#) https://grpc.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

306 lines
9.2 KiB

/*
*
* Copyright 2017 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
/* Test out various metadata handling primitives */
#include <benchmark/benchmark.h>
#include <grpc/grpc.h>
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
#include "src/core/lib/slice/slice_internal.h"
#include "src/core/lib/transport/metadata.h"
#include "src/core/lib/transport/static_metadata.h"
#include "test/core/util/test_config.h"
#include "test/cpp/microbenchmarks/helpers.h"
#include "test/cpp/util/test_config.h"
static void BM_SliceFromStatic(benchmark::State& state) {
TrackCounters track_counters;
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
benchmark::DoNotOptimize(grpc_core::ExternallyManagedSlice("abc"));
}
track_counters.Finish(state);
}
BENCHMARK(BM_SliceFromStatic);
static void BM_SliceFromCopied(benchmark::State& state) {
TrackCounters track_counters;
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_slice_unref(grpc_core::UnmanagedMemorySlice("abc"));
}
track_counters.Finish(state);
}
BENCHMARK(BM_SliceFromCopied);
static void BM_SliceIntern(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ExternallyManagedSlice slice("abc");
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_slice_unref(grpc_core::ManagedMemorySlice(&slice));
}
track_counters.Finish(state);
}
BENCHMARK(BM_SliceIntern);
static void BM_SliceReIntern(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ExternallyManagedSlice static_slice("abc");
grpc_core::ManagedMemorySlice slice(&static_slice);
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_slice_unref(grpc_core::ManagedMemorySlice(&slice));
}
grpc_slice_unref(slice);
track_counters.Finish(state);
}
BENCHMARK(BM_SliceReIntern);
static void BM_SliceInternStaticMetadata(benchmark::State& state) {
TrackCounters track_counters;
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
benchmark::DoNotOptimize(grpc_core::ManagedMemorySlice(&GRPC_MDSTR_GZIP));
}
track_counters.Finish(state);
}
BENCHMARK(BM_SliceInternStaticMetadata);
static void BM_SliceInternEqualToStaticMetadata(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ExternallyManagedSlice slice("gzip");
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
benchmark::DoNotOptimize(grpc_core::ManagedMemorySlice(&slice));
}
track_counters.Finish(state);
}
BENCHMARK(BM_SliceInternEqualToStaticMetadata);
static void BM_MetadataFromNonInternedSlices(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ExternallyManagedSlice k("key");
grpc_core::ExternallyManagedSlice v("value");
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(k, v, nullptr));
}
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromNonInternedSlices);
static void BM_MetadataFromInternedSlices(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ManagedMemorySlice v("value");
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(k, v, nullptr));
}
grpc_slice_unref(k);
grpc_slice_unref(v);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromInternedSlices);
static void BM_MetadataFromInternedSlicesAlreadyInIndex(
benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ManagedMemorySlice v("value");
grpc_core::ExecCtx exec_ctx;
grpc_mdelem seed = grpc_mdelem_create(k, v, nullptr);
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(k, v, nullptr));
}
GRPC_MDELEM_UNREF(seed);
grpc_slice_unref(k);
grpc_slice_unref(v);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromInternedSlicesAlreadyInIndex);
static void BM_MetadataFromInternedKey(benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ExternallyManagedSlice v("value");
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(k, v, nullptr));
}
grpc_slice_unref(k);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromInternedKey);
static void BM_MetadataFromNonInternedSlicesWithBackingStore(
benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ExternallyManagedSlice k("key");
grpc_core::ExternallyManagedSlice v("value");
char backing_store[sizeof(grpc_mdelem_data)];
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(
k, v, reinterpret_cast<grpc_mdelem_data*>(backing_store)));
}
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromNonInternedSlicesWithBackingStore);
static void BM_MetadataFromInternedSlicesWithBackingStore(
benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ManagedMemorySlice v("value");
char backing_store[sizeof(grpc_mdelem_data)];
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(
k, v, reinterpret_cast<grpc_mdelem_data*>(backing_store)));
}
grpc_slice_unref(k);
grpc_slice_unref(v);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromInternedSlicesWithBackingStore);
static void BM_MetadataFromInternedKeyWithBackingStore(
benchmark::State& state) {
TrackCounters track_counters;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ExternallyManagedSlice v("value");
char backing_store[sizeof(grpc_mdelem_data)];
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
GRPC_MDELEM_UNREF(grpc_mdelem_create(
k, v, reinterpret_cast<grpc_mdelem_data*>(backing_store)));
}
grpc_slice_unref(k);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromInternedKeyWithBackingStore);
static void BM_MetadataFromStaticMetadataStrings(benchmark::State& state) {
TrackCounters track_counters;
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
GRPC_MDELEM_UNREF(
grpc_mdelem_create(GRPC_MDSTR_STATUS, GRPC_MDSTR_200, nullptr));
}
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromStaticMetadataStrings);
static void BM_MetadataFromStaticMetadataStringsNotIndexed(
benchmark::State& state) {
TrackCounters track_counters;
grpc_core::ExecCtx exec_ctx;
for (auto _ : state) {
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
GRPC_MDELEM_UNREF(
grpc_mdelem_create(GRPC_MDSTR_STATUS, GRPC_MDSTR_GZIP, nullptr));
}
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataFromStaticMetadataStringsNotIndexed);
static void BM_MetadataRefUnrefExternal(benchmark::State& state) {
TrackCounters track_counters;
char backing_store[sizeof(grpc_mdelem_data)];
grpc_core::ExecCtx exec_ctx;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_mdelem el =
grpc_mdelem_create(grpc_core::ExternallyManagedSlice("a"),
grpc_core::ExternallyManagedSlice("b"),
reinterpret_cast<grpc_mdelem_data*>(backing_store));
for (auto _ : state) {
GRPC_MDELEM_UNREF(GRPC_MDELEM_REF(el));
}
GRPC_MDELEM_UNREF(el);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataRefUnrefExternal);
static void BM_MetadataRefUnrefInterned(benchmark::State& state) {
TrackCounters track_counters;
char backing_store[sizeof(grpc_mdelem_data)];
grpc_core::ExecCtx exec_ctx;
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_core::ManagedMemorySlice k("key");
grpc_core::ManagedMemorySlice v("value");
grpc_mdelem el = grpc_mdelem_create(
k, v, reinterpret_cast<grpc_mdelem_data*>(backing_store));
grpc_slice_unref(k);
grpc_slice_unref(v);
for (auto _ : state) {
GRPC_MDELEM_UNREF(GRPC_MDELEM_REF(el));
}
GRPC_MDELEM_UNREF(el);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataRefUnrefInterned);
static void BM_MetadataRefUnrefAllocated(benchmark::State& state) {
TrackCounters track_counters;
grpc_core::ExecCtx exec_ctx;
grpc_mdelem el =
Added specializations for grpc_mdelem_create. In several cases, we create grpc mdelem structures using known-static metadata inputs. Furthermore, in several cases we create a slice on the heap (e.g. grpc_slice_from_copied_buffer) where we know we are transferring refcount ownership. In several cases, then, we can: 1) Avoid unnecessary ref/unref operations that are no-ops (for static slices) or superfluous (if we're transferring ownership). 2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since they'd only be called with static or interned slice arguments, which by construction would have equal refcounts if they were in fact equal. 3) Avoid unnecessary checks to see if a slice is interned (when we know that they are). To avoid polluting the internal API, we introduce the notion of strongly-typed grpc_slice objects. We draw a distinction between Internal (interned and static-storage) slices and Extern (inline and non-statically allocated). We introduce overloads to grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath cases identified above based on these slice types. From the programmer's point of view, though, nothing changes - they need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as before, and the appropriate fastpath will be picked based on type inference. If no special knowledge exists for the slice type (i.e. we pass in generic grpc_slice objects), the slowpath method will still always return correct behaviour. This is good for: - Roughly 1-3% reduction in CPU time for several unary/streaming ping pong fullstack microbenchmarks. - Reduction of about 15-20% in CPU time for some hpack parser microbenchmarks. - 10-12% reduction of CPU time for metadata microbenchmarks involving interned slice comparisons.
6 years ago
grpc_mdelem_create(grpc_core::ExternallyManagedSlice("a"),
grpc_core::ExternallyManagedSlice("b"), nullptr);
for (auto _ : state) {
GRPC_MDELEM_UNREF(GRPC_MDELEM_REF(el));
}
GRPC_MDELEM_UNREF(el);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataRefUnrefAllocated);
static void BM_MetadataRefUnrefStatic(benchmark::State& state) {
TrackCounters track_counters;
grpc_core::ExecCtx exec_ctx;
grpc_mdelem el =
grpc_mdelem_create(GRPC_MDSTR_STATUS, GRPC_MDSTR_200, nullptr);
for (auto _ : state) {
GRPC_MDELEM_UNREF(GRPC_MDELEM_REF(el));
}
GRPC_MDELEM_UNREF(el);
track_counters.Finish(state);
}
BENCHMARK(BM_MetadataRefUnrefStatic);
// Some distros have RunSpecifiedBenchmarks under the benchmark namespace,
// and others do not. This allows us to support both modes.
namespace benchmark {
void RunTheBenchmarksNamespaced() { RunSpecifiedBenchmarks(); }
} // namespace benchmark
int main(int argc, char** argv) {
grpc::testing::TestEnvironment env(argc, argv);
LibraryInitializer libInit;
::benchmark::Initialize(&argc, argv);
::grpc::testing::InitTest(&argc, &argv, false);
benchmark::RunTheBenchmarksNamespaced();
return 0;
}