Arjun Roy
0b06676c9e
hpack encoder optimizations.
...
Removed some cycles and branches from hpack_enc for CH2.
Specifically:
1. Pushed certain metadata key/value length checks to
prepare_application_metadata() in src/core/lib/surface/call.cc.
This means that rather than check all key/val lengths for all metadata, we only
do so for custom added user metadata. Inside CH2, we change the length checks to
debug checks so we can catch if core/filter metadata fails to pass the check.
2. Changed various asserts to debug asserts when able.
3. Refactored some of the header emission code to remove duplicated code.
4. Un-inlined some logging methods.
This results in somewhat faster hpack_encoder performance:
BM_HpackEncoderInitDestroy
222ns ± 0% 221ns ± 0% -0.29% (p=0.000 n=34+34)
BM_HpackEncoderEncodeDeadline
[framing_bytes/iter:9 header_bytes/iter:6 ] 135ns ± 1%
124ns ± 0% -8.05% (p=0.000 n=39+38)
BM_HpackEncoderEncodeHeader<EmptyBatch>/0/16384
[framing_bytes/iter:9 header_bytes/iter:0 ] 34.2ns ± 0%
34.2ns ± 0% -0.01% (p=0.014 n=34+38)
BM_HpackEncoderEncodeHeader<EmptyBatch>/1/16384
[framing_bytes/iter:9 header_bytes/iter:0 ] 34.2ns ± 0%
34.2ns ± 0% -0.04% (p=0.004 n=34+37)
BM_HpackEncoderEncodeHeader<SingleStaticElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.5ns ± 0%
45.9ns ± 0% -3.28% (p=0.000 n=28+38)
BM_HpackEncoderEncodeHeader<SingleInternedKeyElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:6 ] 77.0ns ± 1%
68.3ns ± 1% -11.33% (p=0.000 n=39+40)
BM_HpackEncoderEncodeHeader<SingleInternedElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 1%
45.5ns ± 0% -4.63% (p=0.000 n=39+33)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<1, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.2ns ± 0%
45.3ns ± 0% -3.96% (p=0.000 n=33+34)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.6ns ± 0% -4.54% (p=0.000 n=38+40)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<10, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.63% (p=0.000 n=39+32)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<31, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 0%
45.6ns ± 1% -4.59% (p=0.000 n=38+39)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<100, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 0%
45.5ns ± 0% -4.64% (p=0.000 n=39+36)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<1, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.3ns ± 0%
45.3ns ± 0% -4.09% (p=0.000 n=38+36)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 1%
45.6ns ± 0% -4.71% (p=0.000 n=37+40)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<10, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.66% (p=0.000 n=39+32)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<31, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 1%
45.6ns ± 1% -4.62% (p=0.000 n=37+39)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<100, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.67% (p=0.000 n=38+32)
BM_HpackEncoderEncodeHeader<SingleNonInternedElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:9 ] 80.5ns ± 1%
74.7ns ± 0% -7.16% (p=0.000 n=38+35)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<1, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:12 ] 105ns ± 1%
99ns ± 0% -5.91% (p=0.000 n=38+34)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<3, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:14 ] 111ns ± 1%
106ns ± 1% -4.86% (p=0.020 n=39+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<10, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:23 ] 135ns ± 0%
130ns ± 0% -3.45% (p=0.020 n=35+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<31, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:46 ] 225ns ± 1%
223ns ± 0% -0.91% (p=0.003 n=37+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<100, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:120 ] 467ns ± 0%
472ns ± 0% +1.09% (p=0.003 n=38+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<1, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:12 ] 81.6ns ± 1%
74.8ns ± 0% -8.40% (p=0.000 n=37+33)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<3, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:14 ] 82.0ns ± 1%
74.8ns ± 0% -8.80% (p=0.000 n=37+32)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<10, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:21 ] 82.1ns ± 1%
74.9ns ± 0% -8.86% (p=0.000 n=35+34)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<31, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:42 ] 97.6ns ± 2%
91.8ns ± 0% -5.95% (p=0.000 n=35+27)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<100, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:111 ] 97.2ns ± 1%
91.2ns ± 2% -6.19% (p=0.000 n=37+38)
BM_HpackEncoderEncodeHeader<SingleNonInternedElem>/0/1
[framing_bytes/iter:54 header_bytes/iter:9 ] 230ns ± 0%
221ns ± 0% -3.91% (p=0.000 n=38+37)
BM_HpackEncoderEncodeHeader<MoreRepresentativeClientInitialMetadata>/0/16384
[framing_bytes/iter:9 header_bytes/iter:16 ] 206ns ± 2%
170ns ± 1% -17.51% (p=0.000 n=39+39)
BM_HpackEncoderEncodeHeader<RepresentativeServerInitialMetadata>/0/16384
[framing_bytes/iter:9 header_bytes/iter:3 ] 66.4ns ± 2%
62.5ns ± 1% -5.85% (p=0.000 n=34+39)
BM_HpackEncoderEncodeHeader<RepresentativeServerTrailingMetadata>/1/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.5ns ± 0%
45.9ns ± 1% -3.29% (p=0.000 n=26+38)
5 years ago
Hope Casey-Allen
32801fb5eb
Remove build target for microbenchmark
5 years ago
Vijay Pai
1077b3435c
Use range-based for on state rather than state.KeepRunning when possible
5 years ago
Arjun Roy
b46e3668d3
s/branch/tail_call/ for CH2 on_hdr().
...
on_hdr() checks if a void-return function pointer is null before jumping to it.
If it is null, it returns an error; else it executes that function and returns
success.
This change converts the void-returning function to one that returns a
grpc_error* and thus saves a branch in on_hdr() (since we're branching once by
following the function pointer anyways, we're effectively coalescing these two
branches).
5 years ago
Hope Casey-Allen
6dfe27ab08
Fix race in bm_chttp2_transport
5 years ago
Hope Casey-Allen
59564ebd96
Fix warnings to unblock gcc8 support
5 years ago
Arjun Roy
557446a11e
Added specializations for grpc_mdelem_create.
...
In several cases, we create grpc mdelem structures using known-static
metadata inputs. Furthermore, in several cases we create a slice on
the heap (e.g. grpc_slice_from_copied_buffer) where we know we are
transferring refcount ownership. In several cases, then, we can:
1) Avoid unnecessary ref/unref operations that are no-ops (for static
slices) or superfluous (if we're transferring ownership).
2) Avoid unnecessarily comprehensive calls to grpc_slice_eq (since
they'd only be called with static or interned slice arguments,
which by construction would have equal refcounts if they were
in fact equal.
3) Avoid unnecessary checks to see if a slice is interned (when we
know that they are).
To avoid polluting the internal API, we introduce the notion of
strongly-typed grpc_slice objects. We draw a distinction between
Internal (interned and static-storage) slices and Extern (inline and
non-statically allocated). We introduce overloads to
grpc_mdelem_create() and grpc_mdelem_from_slices() for the fastpath
cases identified above based on these slice types.
From the programmer's point of view, though, nothing changes - they
need only use grpc_mdelem_create() and grpc_mdelem_from_slices() as
before, and the appropriate fastpath will be picked based on type
inference. If no special knowledge exists for the slice type (i.e. we
pass in generic grpc_slice objects), the slowpath method will still
always return correct behaviour.
This is good for:
- Roughly 1-3% reduction in CPU time for several unary/streaming
ping pong fullstack microbenchmarks.
- Reduction of about 15-20% in CPU time for some hpack parser
microbenchmarks.
- 10-12% reduction of CPU time for metadata microbenchmarks involving
interned slice comparisons.
5 years ago
Yunjia Wang
8318e578db
SpikyLoad: construct outside
5 years ago
Yunjia Wang
c3c24d089d
Use Template
5 years ago
Yunjia Wang
d87b5285ca
Fix comment
5 years ago
Yunjia Wang
efd6946d21
Reformat
5 years ago
Yunjia Wang
9242fe122d
AddSelf more scenarios
5 years ago
Mark D. Roth
46f706c99b
Revert "Merge pull request #19686 from gnossen/revert_breakage"
...
This reverts commit 1f2398b0d5
, reversing
changes made to 99169d811c
.
5 years ago
Yunjia Wang
8278d3e6a5
Resolving comments
5 years ago
Yunjia Wang
847faf407f
Removes unused variable error
5 years ago
Yunjia Wang
85314b3fcc
Re-format
5 years ago
Yunjia Wang
c6bc2b1875
Add threadpool benchmark and build files
5 years ago
Richard Belleville
63b4f3d819
Revert "Merge pull request #19673 from markdroth/channel_grpc_init"
...
This reverts commit 4e21980716
, reversing
changes made to 62b8a783fa
.
5 years ago
Mark D. Roth
8cc5b8f680
Defer grpc shutdown until after channel destruction.
5 years ago
Arjun Roy
b1d73a01f1
Removed duplicate static table from hpack table. Removed an or instruction for
...
every usage of static grpc metadata. Inlined hpack table lookups for static
metadata.
This leads to faster hpack parser creation:
BM_HpackParserInitDestroy 5.32µs ± 1% 0.06µs ± 1% -98.91% (p=0.000 n=18+19)
And slightly faster parsing:
BM_HpackParserParseHeader<RepresentativeClientInitialMetadata, OnInitialHeader>
456ns ± 1% 435ns ± 1% -4.74% (p=0.000 n=18+19)
BM_HpackParserParseHeader<MoreRepresentativeClientInitialMetadata,
OnInitialHeader>
1.06µs ± 2% 1.04µs ± 2% -1.82% (p=0.000 n=19+20)
It also yields a slight (0.5 - 1.0 microsecond) reduction in CPU time for
fullstack unary pingpong:
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/0/512
[polls/iter:3.0001 ] 23.9µs ± 2%
23.0µs ± 1% -3.63% (p=0.002 n=6+6)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/0/32768
[polls/iter:3.00015 ] 35.1µs ± 1%
34.2µs ± 1% -2.57% (p=0.036 n=5+3)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/8/0
[polls/iter:3.00011 ] 21.7µs ± 3%
21.2µs ± 2% -2.44% (p=0.017 n=6+5)
5 years ago
Yunjia Wang
fdc250d618
remove bencharmk
5 years ago
yunjiaw26
b0b81792ee
Delete bm_threadpool.cc
5 years ago
Yunjia Wang
a63cbfb61e
Fix headers order
5 years ago
Yunjia Wang
500cb1f99b
Reformat
5 years ago
Yunjia Wang
093dd768bb
reformat
5 years ago
Yunjia Wang
9421a27a76
Remove extra headers
5 years ago
Yunjia Wang
cac8afa159
Add benchmark
5 years ago
Yash Tibrewal
56a0153f16
Heap allocate the stream object for other benchmark cases too
6 years ago
Yash Tibrewal
cceca10a8a
Fix data race, heap use-after-free issue in bm_chttp2_transport
6 years ago
Karthik Ravi Shankar
196b0aa3a3
Revert "Revert "Start supporting a callback-based RPC under lock""
6 years ago
Karthik Ravi Shankar
b790c24e5c
Revert "Start supporting a callback-based RPC under lock"
6 years ago
Mark D. Roth
477ebef532
Remove CreateChannel() method from LB helper API.
6 years ago
Karthik Ravi Shankar
b18faa6c95
Fix tsan error
6 years ago
Karthik Ravi Shankar
d2c8eb94c9
Fix microbenchmark failures
6 years ago
Karthik Ravi Shankar
e1f62278e3
Fix clang error
6 years ago
Karthik Ravi Shankar
4f7f561564
Add synchronization to bm test
...
- since we made the callback run on another thread, add synchronization
in bm tests as well
6 years ago
Karthik Ravi Shankar
40210d3b8a
Move Channel to grpc_impl
6 years ago
Karthik Ravi Shankar
772a74aced
Revert changes to Channel
6 years ago
Na-Na Pang
a02c76dfb9
Cancel predefine number of streaming
6 years ago
Na-Na Pang
87d75d2a88
Add explicit and fix error
6 years ago
Na-Na Pang
a2daa4ff08
Clean format'
6 years ago
Na-Na Pang
1ea651aee3
Add assertion
6 years ago
Na-Na Pang
762e58b574
Change client context allocation
6 years ago
Na-Na Pang
070902b871
Merge bm_callback_cq to bm_cq
6 years ago
Na-Na Pang
3fc702510f
Reuse reactor to send new RPC
6 years ago
Na-Na Pang
2d5a9750a0
Manually add echo.proto to pass Portability build test
6 years ago
Na-Na Pang
1ba5f5c701
Modify build file
6 years ago
Na-Na Pang
32e10e618a
address the reference arguments
6 years ago
Na-Na Pang
714e13b426
Delete log
6 years ago
Na-Na Pang
c905f76a5b
Clang format
6 years ago