alignment options (for cache-alignment).
We shrink by:
1) Removing an unnecessary zone pointer.
2) Replacing gpr_mu (40 bytes when using pthread_mutex_t) with
std::atomic_flag.
We also header-inline the fastpath alloc (ie. when not doing a zone
alloc) and move the malloc() for a zone alloc outside of the mutex
critical zone, which allows us to replace the mutex with a spinlock.
We also cache-align created arenas.
grpc_byte_buffer_reader_next() copies and references the slice. This
is not always necessary since the caller will not use the slice
after destroying the byte buffer.
A prominent example is the protobuf parser, which
calls grpc_byte_buffer_reader_next() and immediately unrefs the slice
after the call. This ref() and unref() calls can be very expensive
in the hot path.
This commit introduces grpc_byte_buffer_reader_peek() which
essentialy return a pointer to the slice in the buffer, i.e.,
no copies, and no refs.
QPS of 1MiB 1 Channel callback benchmark increases by 5%.
More importantly insructions per cycle is increased by 10%.
Also add tests and benchmarks for byte_buffer_reader_peek()
This commit reaplies 509e77a5a3
TCP_INQ is a socket option we added to Linux to report pending bytes
on the socket as a control message.
Using TCP_INQ we can accurately decide whether to continue read or not.
Add an urgent parameter, when we do not want to wait for EPOLLIN.
This commit improves the latency of 1 RPC unary (minimal benchmark)
significantly:
Before:
l_50: 61.3584984733
l_90: 94.8328711277
l_99: 126.211351174
l_999: 158.722406029
After:
l_50: 51.3546011488 (-16%)
l_90: 72.3420731581 (-23%)
l_99: 103.280218974 (-18%)
l_999: 130.905689996 (-17%)