When there num_frequently_polled_cqs is non-zero (aka hybrid server),
we create non-polling CQs for the sync methods. But, since we
increase num_frequently_polled_cqs for callback methods
after creating the sync CQs, the sync CQs would not detect
a hyprid server, and will create a polling CQ.
This commit reorders the logic, so that we increment
num_frequently_polled_cqs upon detecting a callback service.
This lowers the context switches by double digit percentage
when using callback API.
This is to use `grpc_core::RefCount` to improve performnace.
This commit also replaces explicit C vtables, with C++ vtable
with its own compile time assertions and performance benefits.
It also makes use of `RefCountedPtr` wherever possible.
In tensorflow, RPC client thread doesn't active release,
rely on process to cleanup. If process have already
cleanup the global variable(g_default_client_callbacks),
after that client issue a RPC call which contains the ClientContext,
then once ClientContext destructor called,
pure virtual functions call error is reported.
Specifically: if a request handling thread is in flight but scheduled
out when shutdown is called on the server, but it has already passed
the shutdown check, then when it resumes it will add a grpc_call to
the completion queue that is leaked. We fix this by explicitly freeing
such calls after all worker threads have shutdown.
To manifest the leak, run the end2end::ClientCancelsRequestStream
test repeatedly on the unpatched server implementation. About 0.5% of
the time, the leak will manifest.