Joshua Haberman
|
3eba47914b
|
Allocate hasbits and table slots in "hotness" order.
Without a profile, we assume that fields with smaller numbers
are hotter.
|
4 years ago |
Joshua Haberman
|
021db6fcd5
|
Allow larger tags into the table if they are unique mod 31.
Also fixed a bug with fixed packed in decode_fast.c.
|
4 years ago |
Joshua Haberman
|
86d9908c55
|
Fastdecode support for packed fields.
This is not very optimized yet. There is a lot of room to
optimize it further.
|
4 years ago |
Joshua Haberman
|
e3e797b680
|
Added fasttable support for oneofs.
|
4 years ago |
Joshua Haberman
|
7ffa9c181a
|
Fixed some small bugs and performance problems in string copying.
Before this CL, with alias=false:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3715 ns 3715 ns 188916 1.88206GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,122.92 msec task-clock # 0.979 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
196 page-faults # 0.175 K/sec
4,144,746,717 cycles # 3.691 GHz
15,351,966,804 instructions # 3.70 insn per cycle
2,590,281,905 branches # 2306.728 M/sec
2,996,157 branch-misses # 0.12% of all branches
1.146615328 seconds time elapsed
1.115578000 seconds user
0.008025000 seconds sys
After this CL:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3554 ns 3554 ns 197527 1.9674GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,105.34 msec task-clock # 0.982 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
197 page-faults # 0.178 K/sec
4,077,736,892 cycles # 3.689 GHz
15,442,709,352 instructions # 3.79 insn per cycle
2,435,131,301 branches # 2203.068 M/sec
2,643,775 branch-misses # 0.11% of all branches
1.125393845 seconds time elapsed
1.097770000 seconds user
0.008012000 seconds sys
|
4 years ago |
Joshua Haberman
|
e2c709e047
|
Repeated string and primitive support.
Much of the code was adapted from Gerben's code in:
6333031195
|
4 years ago |
Joshua Haberman
|
e9103eda9e
|
Merge branch 'master' into fastest-table
|
4 years ago |
Joshua Haberman
|
0756999ab6
|
Merge pull request #325 from haberman/inlined-arena
Fixed upb::InlinedArena, which was completely broken.
|
4 years ago |
Joshua Haberman
|
25db40bc30
|
Fixed upb::InlinedArena, which was compeltely broken.
|
4 years ago |
Joshua Haberman
|
d81ba58215
|
Optimized short string copying.
This sped up the alias=false case:
Before:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,216.65 msec task-clock # 0.936 CPUs utilized
6 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
200 page-faults # 0.164 K/sec
4,490,925,650 cycles # 3.691 GHz
16,516,403,731 instructions # 3.68 insn per cycle
2,828,536,650 branches # 2324.861 M/sec
5,425,830 branch-misses # 0.19% of all branches
1.300178903 seconds time elapsed
1.211475000 seconds user
0.072207000 seconds sys
After:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,109.69 msec task-clock # 0.930 CPUs utilized
5 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
198 page-faults # 0.178 K/sec
4,094,010,257 cycles # 3.689 GHz
15,672,677,812 instructions # 3.83 insn per cycle
2,589,291,160 branches # 2333.346 M/sec
3,306,386 branch-misses # 0.13% of all branches
1.193221789 seconds time elapsed
1.102538000 seconds user
0.072166000 seconds sys
|
4 years ago |
Joshua Haberman
|
f3a2a79349
|
More optimization, back up to 2.56GB/s.
|
4 years ago |
Joshua Haberman
|
199c914295
|
Simplify push/pop when msg fits in the current buffer.
|
4 years ago |
Joshua Haberman
|
d5f5db2729
|
Put string-copying field parser into a separate function.
This helps to regain a bit of lost perf. Now at 2.3GB/s.
|
4 years ago |
Joshua Haberman
|
883f20d4dc
|
Merge branch 'master' into fastest-table
Unfortunately this regresses the benchmark to ~2.25GB/s.
Optimizations forthcoming.
|
4 years ago |
Joshua Haberman
|
1bd62e8218
|
Merge pull request #324 from haberman/simplemomi
Eliminated bounds checks inside parsing a field.
|
4 years ago |
Joshua Haberman
|
f4adbe0698
|
Optimized varint decoding from Gerben.
This speeds things up but costs some code size.
name old time/op new time/op delta
ArenaOneAlloc 21.1ns ± 0% 21.3ns ± 0% +1.33% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% ~ (p=0.579 n=10+10)
LoadDescriptor_Upb 111µs ± 1% 110µs ± 1% -0.91% (p=0.003 n=11+12)
LoadDescriptor_Proto2 258µs ± 1% 258µs ± 1% ~ (p=0.674 n=10+12)
Parse_Upb_FileDesc_WithArena 11.2µs ± 0% 10.4µs ± 0% -6.67% (p=0.000 n=12+12)
Parse_Upb_FileDesc_WithInitialBlock 10.6µs ± 0% 10.1µs ± 0% -4.48% (p=0.000 n=12+11)
SerializeDescriptor_Proto2 5.36µs ± 5% 5.36µs ± 3% ~ (p=0.880 n=12+11)
SerializeDescriptor_Upb 11.9µs ± 0% 12.0µs ± 0% +0.81% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+23% +1.11Ki +24% +1.06Ki upb/decode.c
+15% +560 +15% +560 decode_msg
+140% +240 +188% +240 decode_longvarint64
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-49.7% -1.06Ki [ = ] 0 [Unmapped]
+0.0% +48 +0.9% +1.06Ki TOTAL
|
4 years ago |
Joshua Haberman
|
48689df72e
|
Eliminated bounds checks inside parsing a field.
Each field parser gets 16 bytes of slop. This requires using a patch
buffer at end-of-buffer.
This addes 80% of what is needed to support a pull parser with a data
callback, since the main parser is now tolerant to buffer flips.
There is a ~4% performance regression and 12% code size regression in
upb/decode.c:
name old time/op new time/op delta
ArenaOneAlloc 21.0ns ± 0% 21.6ns ± 0% +2.87% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% +0.09% (p=0.001 n=11+12)
LoadDescriptor_Upb 114µs ± 1% 115µs ± 1% +0.96% (p=0.000 n=11+12)
LoadDescriptor_Proto2 260µs ± 1% 261µs ± 1% +0.55% (p=0.033 n=12+12)
Parse_Upb_FileDesc_WithArena 10.8µs ± 0% 11.2µs ± 0% +3.43% (p=0.000 n=11+11)
Parse_Upb_FileDesc_WithInitialBlock 10.5µs ± 0% 10.9µs ± 0% +3.68% (p=0.000 n=12+12)
SerializeDescriptor_Proto2 5.25µs ± 3% 5.42µs ± 5% +3.39% (p=0.007 n=12+12)
SerializeDescriptor_Upb 12.0µs ± 0% 12.5µs ± 0% +4.14% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+12% +606 +12% +560 upb/decode.c
+7.9% +288 +7.9% +288 decode_msg
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-9.3% -16 -12.5% -16 decode_longvarint64
-25.5% -558 [ = ] 0 [Unmapped]
+0.0% +48 +0.4% +560 TOTAL
|
4 years ago |
Joshua Haberman
|
a345af9883
|
Added a codegen parameter for whether fasttables are generated or not.
Example:
$ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=false
INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache.
INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured).
INFO: Found 1 target...
Target //benchmarks:benchmark up-to-date:
bazel-bin/benchmarks/benchmark
INFO: Elapsed time: 0.760s, Critical Path: 0.58s
INFO: 7 processes: 1 internal, 6 linux-sandbox.
INFO: Build completed successfully, 7 total actions
$ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithArena 10985 ns 10984 ns 63567 651.857MB/s
BM_Parse_Upb_FileDesc_WithInitialBlock 10556 ns 10554 ns 66138 678.458MB/s
$ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=true
INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache.
INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured).
INFO: Found 1 target...
Target //benchmarks:benchmark up-to-date:
bazel-bin/benchmarks/benchmark
INFO: Elapsed time: 0.744s, Critical Path: 0.58s
INFO: 7 processes: 1 internal, 6 linux-sandbox.
INFO: Build completed successfully, 7 total actions
$ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithArena 3284 ns 3284 ns 213495 2.1293GB/s
BM_Parse_Upb_FileDesc_WithInitialBlock 2882 ns 2882 ns 243069 2.4262GB/s
Biggest unknown is whether this parameter should default to true or false.
|
4 years ago |
Joshua Haberman
|
8e8dbb5258
|
Merge branch 'fastest-table' into fast-table
|
4 years ago |
Joshua Haberman
|
7d17a0e8c5
|
Merge branch 'master' into fastest-table
|
4 years ago |
Joshua Haberman
|
a7e2e8338d
|
Fixed benchmark script.
|
4 years ago |
Joshua Haberman
|
72de7b7002
|
Merge branch 'fastest-table' into fast-table
|
4 years ago |
Joshua Haberman
|
cb234e652c
|
Merge branch 'master' into fastest-table
|
4 years ago |
Joshua Haberman
|
b86cf2d789
|
Merge pull request #323 from haberman/build-files
Split monolithic BUILD file into many build files.
|
4 years ago |
Joshua Haberman
|
4ea81ab107
|
Fixed pedantic warning.
|
4 years ago |
Joshua Haberman
|
6399b31f4b
|
Removed ULL constants in json_decode.c.
|
4 years ago |
Joshua Haberman
|
c8ae197e64
|
Removed "U" suffixes, they are not necessary.
|
4 years ago |
Joshua Haberman
|
bc1e0b314f
|
Fixed some strict C89 errors.
|
4 years ago |
Joshua Haberman
|
2c1664906a
|
Removed license comments and upb_amalgamation for google3.
|
4 years ago |
Joshua Haberman
|
b7dc77415a
|
Added licenses() to all BUILD files.
|
4 years ago |
Joshua Haberman
|
de22764b33
|
Updated Kokoro to test ... instead of :all.
|
4 years ago |
Joshua Haberman
|
e3f41de6c7
|
Split monolithic BUILD file into many build files.
|
4 years ago |
Joshua Haberman
|
fbe2bcafbc
|
Merge pull request #4 from gerben-s/gerbens-fast-table
Add repeated varints and fixed parsers
|
4 years ago |
gerben-s
|
9e68ec033f
|
Add repeated varints and fixed parsers
|
4 years ago |
Joshua Haberman
|
d0e4b688c6
|
Shorten name of kAliasString, so benchmark results don't wrap.
|
4 years ago |
Joshua Haberman
|
c0c9b5a168
|
Regenerated generated code.
|
4 years ago |
Joshua Haberman
|
eb8e6de8b7
|
Regenerated source files.
|
4 years ago |
Joshua Haberman
|
7f0d535826
|
Merge branch 'fastest-table' into fast-table
|
4 years ago |
Joshua Haberman
|
bf8e08074c
|
Added a few more comments.
|
4 years ago |
Joshua Haberman
|
6e3c22e6ee
|
Merge branch 'fastest-table' into fast-table
|
4 years ago |
Joshua Haberman
|
3238821315
|
Gave fast table entry a nicer name.
|
4 years ago |
Joshua Haberman
|
2a574d3d01
|
Added a bunch of comments for readability.
|
4 years ago |
Joshua Haberman
|
0deca8b8fb
|
Merge branch 'master' into fast-table
|
4 years ago |
Joshua Haberman
|
bfadc99709
|
Merge branch 'master' into fastest-table
|
4 years ago |
Joshua Haberman
|
84e0f6127d
|
Merge branch 'master' into fastest-table
|
4 years ago |
Joshua Haberman
|
61c51a607b
|
Merge branch 'master' into fast-table
|
4 years ago |
Joshua Haberman
|
4f066765a9
|
Merge pull request #320 from haberman/string-view-benchmark
Added a benchmark for ctype=STRING_PIECE
|
4 years ago |
Joshua Haberman
|
bf393bf086
|
Cleaned up benchmark names.
|
4 years ago |
Joshua Haberman
|
9eb8414b31
|
Added descriptor_sv.proto.
|
4 years ago |
Joshua Haberman
|
ee7da95367
|
Bzl formatting fix per buildifier.
|
4 years ago |