grpc_byte_buffer_reader_next() copies and references the slice. This
is not always necessary since the caller will not use the slice
after destroying the byte buffer.
A prominent example is the protobuf parser, which
calls grpc_byte_buffer_reader_next() and immediately unrefs the slice
after the call. This ref() and unref() calls can be very expensive
in the hot path.
This commit introduces grpc_byte_buffer_reader_peek() which
essentialy return a pointer to the slice in the buffer, i.e.,
no copies, and no refs.
QPS of 1MiB 1 Channel callback benchmark increases by 5%.
More importantly insructions per cycle is increased by 10%.
Also add tests and benchmarks for byte_buffer_reader_peek()
This commit reaplies 509e77a5a3
It turns out that the code generation for "with gil" is a bit more
complicated than the logic for re-obtaining the gil at the end of
"with nogil." This is because PyGILState_Ensure seems to, during
interpreter finalization, think it needs to call a new thread
(resulting in a call to cpython new_threadstate) which then segfaults.
Because "with nogil" knows that, prior to executing, it already had
the gil, it doesn't need to set up as much state, and thus the segfault
does not occur.
To avoid this, we just only use "with nogil" within the infinite loop,
and then end the "nogil" block before we check signals. This avoids
needing any "with gil" call at all.
I was able to reliably reproduce the segfault within a few minutes
before the patch by running a binary in a loop (with py3) while
maxing out my machines cpu usage. After the patch, I have not
been able to reproduce the segfault after two hours.
Note that this race can only occur when the user does not properly
clean up all their channels, and is relying on garbage collection to
do so (which isn't guaranteed). However, we want to avoid a segfault
on failure to close because this isn't a good user error and makes it
hard to debug.
Use compare_exchange_strong() instead of compare_exchange_weak(), which
can be spuriously fail on some platforms.
Thanks to Mark Roth to pointing this out!
BoringSSL builds its crypto_test and ssl_test as single targets, while
gRPC was building them with a target per file. This no longer works with
tip-of-tree BoringSSL.
This change aligns gRPC with the way that BoringSSL builds its tests.
The changes to boringssl/gen_build_yaml.py were done by hand, all other
changes result from generate_projects.sh.