grpc_byte_buffer_reader_next() copies and references the slice. This
is not always necessary since the caller will not use the slice
after destroying the byte buffer.
A prominent example is the protobuf parser, which
calls grpc_byte_buffer_reader_next() and immediately unrefs the slice
after the call. This ref() and unref() calls can be very expensive
in the hot path.
This commit introduces grpc_byte_buffer_reader_peek() which
essentialy return a pointer to the slice in the buffer, i.e.,
no copies, and no refs.
QPS of 1MiB 1 Channel callback benchmark increases by 5%.
More importantly insructions per cycle is increased by 10%.
Also add tests and benchmarks for byte_buffer_reader_peek()
Passing grpc_slice by value and/or returning it can be very costly,
introducing many extra instructions to push the structure to the
stack and poping it.
This CL, wherever possible, changes grpc_slice to be passed by
value.
On a local benchmark, I obserse 4-7% improvements in latency and QPS.
There are still copies to the slice_ref vtable which @arjunroy
is fixing as part of his major effort to use grpc_core::RefCount
for slices and devirtualizing them.
We flush these closures only when the connection goes IDLE.
This will cause no completion being sent, if we have a continuous
stream of bytes that never stops, causing a memory bloat because
we never call the callbacks of the ops.
For example, we use 100s of GiB of memory after a minute of exchanging
1MiB RPCs with callback API.
This patch runs the closures when we have done running
one write action.
After this change memory remains stable for the 1MiB benchmark.
QPS is increased by 200 QPS (520 -> 749), and latency is dropped
by 70ms, because we were basically page-faulting on every RPC.
There was a sub-optimality in the CAS operation. vjpai@ and
I decided to move to std::atomic.
This commit basically moves CQ data to C++ structures, and
makes grpc_cq_event_queue a proper c++ class called CQEventQueue.
We are planning to enable -Wextra-semi flag in our project but some
header files in gRPC have extra semicolons that violates the check and
blocks us from enabling the flag.
This change removes unnecessary semicolons in the code. Note that having
semicolon after the GRPC_ABSTRACT macro technically also violates the
check, but it's fine for us since they are not used in public headers,
and it will be confusing to have lines ending only with GRPC_ABSTRACT,
so I keep them as-is.