alignment options (for cache-alignment).
We shrink by:
1) Removing an unnecessary zone pointer.
2) Replacing gpr_mu (40 bytes when using pthread_mutex_t) with
std::atomic_flag.
We also header-inline the fastpath alloc (ie. when not doing a zone
alloc) and move the malloc() for a zone alloc outside of the mutex
critical zone, which allows us to replace the mutex with a spinlock.
We also cache-align created arenas.
TCP_INQ is a socket option we added to Linux to report pending bytes
on the socket as a control message.
Using TCP_INQ we can accurately decide whether to continue read or not.
Add an urgent parameter, when we do not want to wait for EPOLLIN.
This commit improves the latency of 1 RPC unary (minimal benchmark)
significantly:
Before:
l_50: 61.3584984733
l_90: 94.8328711277
l_99: 126.211351174
l_999: 158.722406029
After:
l_50: 51.3546011488 (-16%)
l_90: 72.3420731581 (-23%)
l_99: 103.280218974 (-18%)
l_999: 130.905689996 (-17%)
src/core. exec_ctx is now a thread_local pointer of type ExecCtx instead of
grpc_exec_ctx which is initialized whenever ExecCtx is instantiated. ExecCtx
also keeps track of the previous exec_ctx so that nesting of exec_ctx is
allowed. This means that there is only one exec_ctx being used at any
time. Also, grpc_exec_ctx_finish is called in the destructor of the
object, and the previous exec_ctx is restored to avoid breaking current
functionality. The code still explicitly calls grpc_exec_ctx_finish
because removing all such instances causes the code to break.