For tiny RPCs, every single requests in almost the first item in the
list. Hence, it would try to lock the server to process pending
requests.
Instead of locking, simply set and check atomic values when there is a
possiblity of having pending requests.
This increases QPS by 10%, for the 62-channel/0B-RPC benchmark using the
callback API.
Since GRPC_CLOSUSE_SCHEDULE can schedule callback asynchronously we have
to schedule our own wrapper instead. Also, we cannot use ACQUIRE and
RELEASE directly on the call_combiner, because callbacks are free to even
destroy the call_combiner. Thus, we use a ref-counted structure that
acts as a fake lock for Tsan annotations.
This addresses https://github.com/grpc/grpc/issues/17001. Prior to
https://github.com/grpc/grpc/pull/13603, our credentials cython objects
used grpc_initi() and grpc_shutdown() on creation and destruction. These are
now managed differently, but the grpc_init() and grpc_shutdown() calls
are still required. See the MetadataCredentialsPluginWrapper in C++,
which extends the GrpcLibraryCodegen class to ensure that grpc_init()
and grpc_shutdown() are called appropriately.
Without this, we can deadlock when a call to grpc.Channel#close()
triggers grpc_shutdown() to block and wait for all timer threads to
finish: one of these timer threads may end up unreffing the subchannel
and triggering grpc_call_credentials_unref, which will jump back into
Cython and hang when it tries to reacquire the GIL.