grpc_byte_buffer_reader_next() copies and references the slice. This
is not always necessary since the caller will not use the slice
after destroying the byte buffer.
A prominent example is the protobuf parser, which
calls grpc_byte_buffer_reader_next() and immediately unrefs the slice
after the call. This ref() and unref() calls can be very expensive
in the hot path.
This commit introduces grpc_byte_buffer_reader_peek() which
essentialy return a pointer to the slice in the buffer, i.e.,
no copies, and no refs.
QPS of 1MiB 1 Channel callback benchmark increases by 5%.
More importantly insructions per cycle is increased by 10%.
Also add tests and benchmarks for byte_buffer_reader_peek()
This commit reaplies 509e77a5a3
grpc_byte_buffer_reader_next() copies and references the slice. This
is not always necessary since the caller will not use the slice
after destroying the byte buffer.
A prominent example is the protobuf parser, which
calls grpc_byte_buffer_reader_next() and immediately unrefs the slice
after the call. This ref() and unref() calls can be very expensive
in the hot path.
This commit introduces grpc_byte_buffer_reader_peek() which
essentialy return a pointer to the slice in the buffer, i.e.,
no copies, and no refs.
QPS of 1MiB 1 Channel callback benchmark increases by 5%.
More importantly insructions per cycle is increased by 10%.
Also add tests and benchmarks for byte_buffer_reader_peek()
Ref-counted and inlined slices both have a lenght value, but
they are not in the same byte offset. As a result, we will
have two different loads based on a branch.
Instead move them to the same byteoffset, so that we will have
one move and the same number of branches.
Difference can be seen on:
https://godbolt.org/z/kqFZcP