Arjun Roy
c63f419c49
Mark CH2 on_initial_header error path unlikely.
...
Yields slightly better unary and streaming performance for TCP:
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/4096/4096
[polls/iter:3.00006 ] 27.1µs ± 2%
26.3µs ± 1% -2.77% (p=0.036 n=5+3)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/1/0
[polls/iter:3.00009 ] 21.7µs ± 2%
21.1µs ± 1% -2.88% (p=0.029 n=4+4)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/0/1
[polls/iter:3.00009 ] 21.8µs ± 2%
20.9µs ± 1% -4.32% (p=0.003 n=7+5)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/1/1
[polls/iter:3.00008 ] 22.0µs ± 1%
21.3µs ± 1% -3.15% (p=0.036 n=3+5)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/64/0
[polls/iter:3.00006 ] 22.0µs ± 1%
21.5µs ± 1% -2.19% (p=0.032 n=4+5)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/32768/0
[polls/iter:3.00007 ] 34.7µs ± 1%
34.1µs ± 0% -1.72% (p=0.017 n=7+3)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/0/262144
[polls/iter:3.00023 ] 160µs ± 1%
158µs ± 1% -1.29% (p=0.016 n=8+4)
BM_UnaryPingPong<UDS, NoOpMutator, NoOpMutator>/0/0
[polls/iter:3.00012 ] 20.8µs ± 1%
20.4µs ± 0% -1.89% (p=0.029 n=4+4)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/0/0
[polls/iter:3.00008 ] 22.1µs ± 4%
21.3µs ± 0% -3.88% (p=0.004 n=6+5)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/64/0
[polls/iter:3.00008 ] 23.2µs ± 2%
22.5µs ± 3% -3.07% (p=0.014 n=7+6)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/512/512
[polls/iter:3.0001 ] 23.5µs ± 2%
22.9µs ± 0% -2.85% (p=0.010 n=6+4)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/1/0
[polls/iter:3.00008 ] 22.5µs ± 1%
21.7µs ± 1% -3.35% (p=0.036 n=3+5)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/32768/32768
[polls/iter:3.0001 ] 48.6µs ± 1%
48.3µs ± 1% -0.58% (p=0.045 n=5+8)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/8/8
[polls/iter:3.00008 ] 22.0µs ± 1%
21.5µs ± 1% -2.35% (p=0.016 n=4+5)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/8/8
[polls/iter:3.00006 ] 22.4µs ± 3%
21.4µs ± 1% -4.05% (p=0.017 n=7+3)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/4096/0
[polls/iter:3.00007 ] 24.5µs ± 1%
23.9µs ± 1% -2.30%
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/1/1
[polls/iter:3.0001 ] 22.9µs ± 2%
22.4µs ± 3% -2.04% (p=0.048 n=7+5)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/8/8
[polls/iter:3.0001 ] 23.0µs ± 2%
22.4µs ± 1% -2.75% (p=0.012 n=7+4)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/64/64
[polls/iter:3.00008 ] 23.5µs ± 2%
23.1µs ± 0% -2.10% (p=0.002 n=8+5)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/64/0
[polls/iter:3.00008 ] 22.1µs ± 2%
21.5µs ± 1% -2.93% (p=0.009 n=9+3)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/0/64
[polls/iter:3.00008 ] 22.2µs ± 1%
21.4µs ± 1% -3.51% (p=0.003 n=4+9)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/512/0
[polls/iter:3.00008 ] 22.4µs ± 2%
21.8µs ± 1% -2.75% (p=0.009 n=5+6)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/32768/0
[polls/iter:3.0001 ] 34.5µs ± 1%
34.0µs ± 1% -1.58%
But, slightly worse performance for in-proc (about 2-3%).
5 years ago
Arjun Roy
0b06676c9e
hpack encoder optimizations.
...
Removed some cycles and branches from hpack_enc for CH2.
Specifically:
1. Pushed certain metadata key/value length checks to
prepare_application_metadata() in src/core/lib/surface/call.cc.
This means that rather than check all key/val lengths for all metadata, we only
do so for custom added user metadata. Inside CH2, we change the length checks to
debug checks so we can catch if core/filter metadata fails to pass the check.
2. Changed various asserts to debug asserts when able.
3. Refactored some of the header emission code to remove duplicated code.
4. Un-inlined some logging methods.
This results in somewhat faster hpack_encoder performance:
BM_HpackEncoderInitDestroy
222ns ± 0% 221ns ± 0% -0.29% (p=0.000 n=34+34)
BM_HpackEncoderEncodeDeadline
[framing_bytes/iter:9 header_bytes/iter:6 ] 135ns ± 1%
124ns ± 0% -8.05% (p=0.000 n=39+38)
BM_HpackEncoderEncodeHeader<EmptyBatch>/0/16384
[framing_bytes/iter:9 header_bytes/iter:0 ] 34.2ns ± 0%
34.2ns ± 0% -0.01% (p=0.014 n=34+38)
BM_HpackEncoderEncodeHeader<EmptyBatch>/1/16384
[framing_bytes/iter:9 header_bytes/iter:0 ] 34.2ns ± 0%
34.2ns ± 0% -0.04% (p=0.004 n=34+37)
BM_HpackEncoderEncodeHeader<SingleStaticElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.5ns ± 0%
45.9ns ± 0% -3.28% (p=0.000 n=28+38)
BM_HpackEncoderEncodeHeader<SingleInternedKeyElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:6 ] 77.0ns ± 1%
68.3ns ± 1% -11.33% (p=0.000 n=39+40)
BM_HpackEncoderEncodeHeader<SingleInternedElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 1%
45.5ns ± 0% -4.63% (p=0.000 n=39+33)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<1, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.2ns ± 0%
45.3ns ± 0% -3.96% (p=0.000 n=33+34)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.6ns ± 0% -4.54% (p=0.000 n=38+40)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<10, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.63% (p=0.000 n=39+32)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<31, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 0%
45.6ns ± 1% -4.59% (p=0.000 n=38+39)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<100, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 0%
45.5ns ± 0% -4.64% (p=0.000 n=39+36)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<1, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.3ns ± 0%
45.3ns ± 0% -4.09% (p=0.000 n=38+36)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<3, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 1%
45.6ns ± 0% -4.71% (p=0.000 n=37+40)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<10, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.66% (p=0.000 n=39+32)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<31, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.8ns ± 1%
45.6ns ± 1% -4.62% (p=0.000 n=37+39)
BM_HpackEncoderEncodeHeader<SingleInternedBinaryElem<100, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.7ns ± 0%
45.5ns ± 0% -4.67% (p=0.000 n=38+32)
BM_HpackEncoderEncodeHeader<SingleNonInternedElem>/0/16384
[framing_bytes/iter:9 header_bytes/iter:9 ] 80.5ns ± 1%
74.7ns ± 0% -7.16% (p=0.000 n=38+35)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<1, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:12 ] 105ns ± 1%
99ns ± 0% -5.91% (p=0.000 n=38+34)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<3, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:14 ] 111ns ± 1%
106ns ± 1% -4.86% (p=0.020 n=39+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<10, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:23 ] 135ns ± 0%
130ns ± 0% -3.45% (p=0.020 n=35+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<31, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:46 ] 225ns ± 1%
223ns ± 0% -0.91% (p=0.003 n=37+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<100, false>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:120 ] 467ns ± 0%
472ns ± 0% +1.09% (p=0.003 n=38+2)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<1, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:12 ] 81.6ns ± 1%
74.8ns ± 0% -8.40% (p=0.000 n=37+33)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<3, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:14 ] 82.0ns ± 1%
74.8ns ± 0% -8.80% (p=0.000 n=37+32)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<10, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:21 ] 82.1ns ± 1%
74.9ns ± 0% -8.86% (p=0.000 n=35+34)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<31, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:42 ] 97.6ns ± 2%
91.8ns ± 0% -5.95% (p=0.000 n=35+27)
BM_HpackEncoderEncodeHeader<SingleNonInternedBinaryElem<100, true>>/0/16384
[framing_bytes/iter:9 header_bytes/iter:111 ] 97.2ns ± 1%
91.2ns ± 2% -6.19% (p=0.000 n=37+38)
BM_HpackEncoderEncodeHeader<SingleNonInternedElem>/0/1
[framing_bytes/iter:54 header_bytes/iter:9 ] 230ns ± 0%
221ns ± 0% -3.91% (p=0.000 n=38+37)
BM_HpackEncoderEncodeHeader<MoreRepresentativeClientInitialMetadata>/0/16384
[framing_bytes/iter:9 header_bytes/iter:16 ] 206ns ± 2%
170ns ± 1% -17.51% (p=0.000 n=39+39)
BM_HpackEncoderEncodeHeader<RepresentativeServerInitialMetadata>/0/16384
[framing_bytes/iter:9 header_bytes/iter:3 ] 66.4ns ± 2%
62.5ns ± 1% -5.85% (p=0.000 n=34+39)
BM_HpackEncoderEncodeHeader<RepresentativeServerTrailingMetadata>/1/16384
[framing_bytes/iter:9 header_bytes/iter:1 ] 47.5ns ± 0%
45.9ns ± 1% -3.29% (p=0.000 n=26+38)
5 years ago
Arjun Roy
ee603bf172
Better codegenfor validate_filtered_metadata.
...
validate_filtered_metadata() performs several checks to see if a call must be
failed. Failure is the unlikely case; to that end, failing branches are marked
unlikely, and the specific code handling failure cases is refactored into
explicitly un-inlined helper methods.
This will prevent us from unnecessarily clobbering registers and give us a
straight-line codepath for the success case.
Results:
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/8/0
[polls/iter:3.00008 ] 22.5µs ± 0%
21.6µs ± 0% -4.02% (p=0.036 n=5+3)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/64/64
[polls/iter:3.00008 ] 23.4µs ± 1%
23.0µs ± 1% -1.63% (p=0.010 n=6+4)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/32768/0
[polls/iter:3.00007 ] 34.4µs ± 1%
34.1µs ± 0% -0.99% (p=0.024 n=6+3)
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/0/0
[polls/iter:0 ] 6.36µs ± 5%
6.16µs ± 2% -3.26% (p=0.013 n=20+19)
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/1/1
[polls/iter:0 ] 6.62µs ± 6%
6.50µs ± 4% -1.72% (p=0.049 n=20+20)
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/512/0
[polls/iter:0 ] 6.67µs ± 6%
6.59µs ± 2% -1.29% (p=0.047 n=20+19)
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/4096/0
[polls/iter:0 ] 7.68µs ± 1%
7.65µs ± 2% -0.46% (p=0.031 n=18+18)
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/0/262144
[polls/iter:0 ] 86.0µs ± 2%
85.3µs ± 2% -0.77% (p=0.046 n=19+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/0
[polls/iter:0 ] 6.28µs ± 5%
6.00µs ± 2% -4.37% (p=0.000 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/1/0
[polls/iter:0 ] 6.39µs ± 6%
6.20µs ± 2% -3.03% (p=0.001 n=20+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/1
[polls/iter:0 ] 6.36µs ± 6%
6.17µs ± 1% -3.00% (p=0.006 n=20+17)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/1/1
[polls/iter:0 ] 6.59µs ± 5%
6.30µs ± 2% -4.37% (p=0.000 n=20+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/8/0
[polls/iter:0 ] 6.37µs ± 5%
6.20µs ± 2% -2.76% (p=0.001 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/8
[polls/iter:0 ] 6.36µs ± 5%
6.17µs ± 2% -2.95% (p=0.001 n=20+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/8/8
[polls/iter:0 ] 6.45µs ± 7%
6.27µs ± 1% -2.72% (p=0.002 n=20+18)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/64/0
[polls/iter:0 ] 6.46µs ± 6%
6.31µs ± 1% -2.39% (p=0.001 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/512/0
[polls/iter:0 ] 6.62µs ± 6%
6.43µs ± 2% -2.92% (p=0.000 n=20+18)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/512
[polls/iter:0 ] 6.58µs ± 7%
6.41µs ± 1% -2.57% (p=0.002 n=20+17)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/512/512
[polls/iter:0 ] 6.88µs ± 7%
6.76µs ± 2% -1.81% (p=0.047 n=20+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/4096/0
[polls/iter:0 ] 7.57µs ± 3%
7.49µs ± 2% -0.99% (p=0.007 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/4096
[polls/iter:0 ] 7.66µs ± 5%
7.50µs ± 2% -2.15% (p=0.003 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/32768/0
[polls/iter:0 ] 15.8µs ± 2%
15.7µs ± 1% -0.75% (p=0.001 n=20+19)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/32768
[polls/iter:0 ] 16.1µs ± 2%
16.0µs ± 2% -0.84% (p=0.002 n=20+20)
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/32768/32768
[polls/iter:0 ] 25.5µs ± 1%
25.4µs ± 1% -0.42% (p=0.011 n=20+19)
BM_UnaryPingPong<InProcess, Client_AddMetadata<RandomBinaryMetadata<100>, 2>,
NoOpMutator>/0/0 [polls/iter:0 ]
7.99µs ± 5% 7.85µs ± 2% -1.81% (p=0.028 n=20+20)
BM_UnaryPingPong<InProcess, NoOpMutator,
Server_AddInitialMetadata<RandomBinaryMetadata<100>, 1>>/0/0 [polls/iter:0
] 7.07µs ± 6% 7.14µs ± 5% +0.95% (p=0.007
n=19+18)
BM_UnaryPingPong<InProcess, Client_AddMetadata<RandomAsciiMetadata<31>, 1>,
NoOpMutator>/0/0 [polls/iter:0 ]
6.95µs ± 5% 7.02µs ± 3% +0.94% (p=0.017 n=18+19)
BM_UnaryPingPong<InProcess, Client_AddMetadata<RandomAsciiMetadata<100>, 1>,
NoOpMutator>/0/0 [polls/iter:0 ]
7.10µs ± 2% 7.19µs ± 2% +1.31% (p=0.000 n=16+20)
BM_UnaryPingPong<InProcess, NoOpMutator,
Server_AddInitialMetadata<RandomAsciiMetadata<31>, 1>>/0/0 [polls/iter:0
] 6.89µs ± 2% 7.00µs ± 3% +1.61% (p=0.000
n=17+19)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/512/512
[polls/iter:3.00007 ] 24.1µs ± 1%
23.7µs ± 1% -1.77% (p=0.024 n=6+3)
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/1/0
[polls/iter:3.00009 ] 21.5µs ± 1%
20.9µs ± 0% -2.78% (p=0.024 n=6+3)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/4096/0
[polls/iter:3.00005 ] 24.4µs ± 2%
23.9µs ± 2% -2.16% (p=0.020 n=9+4)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/32768/0
[polls/iter:3.0001 ] 35.3µs ± 1%
34.8µs ± 1% -1.45% (p=0.008 n=5+5)
BM_UnaryPingPong<MinSockPair, NoOpMutator, NoOpMutator>/0/0
[polls/iter:3.00008 ] 19.5µs ± 1%
19.1µs ± 1% -2.30% (p=0.016 n=4+5)
BM_UnaryPingPong<TCP, NoOpMutator, NoOpMutator>/0/32768
[polls/iter:3.0001 ] 35.4µs ± 1%
34.7µs ± 1% -1.77% (p=0.016 n=4+5)
5 years ago