In several cases, we create grpc mdelem structures using known-static
metadata inputs. Furthermore, in several cases we create a slice on
the heap (e.g. grpc_slice_from_copied_buffer) where we know we are
transferring refcount ownership. In several cases, then, we can:
1) Avoid unnecessary ref/unref operations that are no-ops (for static
slices) or superfluous (if we're transferring ownership).
2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since
they'd only be called with static or interned slice arguments,
which by construction would have equal refcounts if they were
in fact equal.
3) Avoid unnecessary checks to see if a slice is interned (when we
know that they are).
To avoid polluting the internal API, we introduce the notion of
strongly-typed grpc_slice objects. We draw a distinction between
Internal (interned and static-storage) slices and Extern (inline and
non-statically allocated). We introduce overloads to
grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath
cases identified above based on these slice types.
From the programmer's point of view, though, nothing changes - they
need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as
before, and the appropriate fastpath will be picked based on type
inference. If no special knowledge exists for the slice type (i.e. we
pass in generic grpc_slice objects), the slowpath method will still
always return correct behaviour.
This is good for:
- Roughly 1-3% reduction in CPU time for several unary/streaming
ping pong fullstack microbenchmarks.
- Reduction of about 15-20% in CPU time for some hpack parser
microbenchmarks.
- 10-12% reduction of CPU time for metadata microbenchmarks involving
interned slice comparisons.
src/core. exec_ctx is now a thread_local pointer of type ExecCtx instead of
grpc_exec_ctx which is initialized whenever ExecCtx is instantiated. ExecCtx
also keeps track of the previous exec_ctx so that nesting of exec_ctx is
allowed. This means that there is only one exec_ctx being used at any
time. Also, grpc_exec_ctx_finish is called in the destructor of the
object, and the previous exec_ctx is restored to avoid breaking current
functionality. The code still explicitly calls grpc_exec_ctx_finish
because removing all such instances causes the code to break.