Joshua Haberman
55f3569cd2
A few minor fixes and more assertions.
4 years ago
Joshua Haberman
8b38e8f214
Merge branch 'fastest-table' into fast-table
4 years ago
Joshua Haberman
46eb82467a
Added comment to decode_fast.h.
4 years ago
Joshua Haberman
bd9f8f580d
Fixed a few bugs with the fast decoder.
...
1. For long tags we were putting table entries in the wrong slot.
2. For repeated strings, when the buffer flipped to no longer alias we
were failing to notice and kept aliasing anyway.
4 years ago
Joshua Haberman
3eba47914b
Allocate hasbits and table slots in "hotness" order.
...
Without a profile, we assume that fields with smaller numbers
are hotter.
4 years ago
Joshua Haberman
021db6fcd5
Allow larger tags into the table if they are unique mod 31.
...
Also fixed a bug with fixed packed in decode_fast.c.
4 years ago
Joshua Haberman
86d9908c55
Fastdecode support for packed fields.
...
This is not very optimized yet. There is a lot of room to
optimize it further.
4 years ago
Joshua Haberman
e3e797b680
Added fasttable support for oneofs.
4 years ago
Joshua Haberman
7ffa9c181a
Fixed some small bugs and performance problems in string copying.
...
Before this CL, with alias=false:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3715 ns 3715 ns 188916 1.88206GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,122.92 msec task-clock # 0.979 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
196 page-faults # 0.175 K/sec
4,144,746,717 cycles # 3.691 GHz
15,351,966,804 instructions # 3.70 insn per cycle
2,590,281,905 branches # 2306.728 M/sec
2,996,157 branch-misses # 0.12% of all branches
1.146615328 seconds time elapsed
1.115578000 seconds user
0.008025000 seconds sys
After this CL:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3554 ns 3554 ns 197527 1.9674GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,105.34 msec task-clock # 0.982 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
197 page-faults # 0.178 K/sec
4,077,736,892 cycles # 3.689 GHz
15,442,709,352 instructions # 3.79 insn per cycle
2,435,131,301 branches # 2203.068 M/sec
2,643,775 branch-misses # 0.11% of all branches
1.125393845 seconds time elapsed
1.097770000 seconds user
0.008012000 seconds sys
4 years ago
Joshua Haberman
e2c709e047
Repeated string and primitive support.
...
Much of the code was adapted from Gerben's code in:
6333031195
4 years ago
Joshua Haberman
e9103eda9e
Merge branch 'master' into fastest-table
4 years ago
Joshua Haberman
0756999ab6
Merge pull request #325 from haberman/inlined-arena
...
Fixed upb::InlinedArena, which was completely broken.
4 years ago
Joshua Haberman
25db40bc30
Fixed upb::InlinedArena, which was compeltely broken.
4 years ago
Joshua Haberman
d81ba58215
Optimized short string copying.
...
This sped up the alias=false case:
Before:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,216.65 msec task-clock # 0.936 CPUs utilized
6 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
200 page-faults # 0.164 K/sec
4,490,925,650 cycles # 3.691 GHz
16,516,403,731 instructions # 3.68 insn per cycle
2,828,536,650 branches # 2324.861 M/sec
5,425,830 branch-misses # 0.19% of all branches
1.300178903 seconds time elapsed
1.211475000 seconds user
0.072207000 seconds sys
After:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,109.69 msec task-clock # 0.930 CPUs utilized
5 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
198 page-faults # 0.178 K/sec
4,094,010,257 cycles # 3.689 GHz
15,672,677,812 instructions # 3.83 insn per cycle
2,589,291,160 branches # 2333.346 M/sec
3,306,386 branch-misses # 0.13% of all branches
1.193221789 seconds time elapsed
1.102538000 seconds user
0.072166000 seconds sys
4 years ago
Joshua Haberman
f3a2a79349
More optimization, back up to 2.56GB/s.
4 years ago
Joshua Haberman
199c914295
Simplify push/pop when msg fits in the current buffer.
4 years ago
Joshua Haberman
d5f5db2729
Put string-copying field parser into a separate function.
...
This helps to regain a bit of lost perf. Now at 2.3GB/s.
4 years ago
Joshua Haberman
883f20d4dc
Merge branch 'master' into fastest-table
...
Unfortunately this regresses the benchmark to ~2.25GB/s.
Optimizations forthcoming.
4 years ago
Joshua Haberman
1bd62e8218
Merge pull request #324 from haberman/simplemomi
...
Eliminated bounds checks inside parsing a field.
4 years ago
Joshua Haberman
f4adbe0698
Optimized varint decoding from Gerben.
...
This speeds things up but costs some code size.
name old time/op new time/op delta
ArenaOneAlloc 21.1ns ± 0% 21.3ns ± 0% +1.33% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% ~ (p=0.579 n=10+10)
LoadDescriptor_Upb 111µs ± 1% 110µs ± 1% -0.91% (p=0.003 n=11+12)
LoadDescriptor_Proto2 258µs ± 1% 258µs ± 1% ~ (p=0.674 n=10+12)
Parse_Upb_FileDesc_WithArena 11.2µs ± 0% 10.4µs ± 0% -6.67% (p=0.000 n=12+12)
Parse_Upb_FileDesc_WithInitialBlock 10.6µs ± 0% 10.1µs ± 0% -4.48% (p=0.000 n=12+11)
SerializeDescriptor_Proto2 5.36µs ± 5% 5.36µs ± 3% ~ (p=0.880 n=12+11)
SerializeDescriptor_Upb 11.9µs ± 0% 12.0µs ± 0% +0.81% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+23% +1.11Ki +24% +1.06Ki upb/decode.c
+15% +560 +15% +560 decode_msg
+140% +240 +188% +240 decode_longvarint64
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-49.7% -1.06Ki [ = ] 0 [Unmapped]
+0.0% +48 +0.9% +1.06Ki TOTAL
4 years ago
Joshua Haberman
48689df72e
Eliminated bounds checks inside parsing a field.
...
Each field parser gets 16 bytes of slop. This requires using a patch
buffer at end-of-buffer.
This addes 80% of what is needed to support a pull parser with a data
callback, since the main parser is now tolerant to buffer flips.
There is a ~4% performance regression and 12% code size regression in
upb/decode.c:
name old time/op new time/op delta
ArenaOneAlloc 21.0ns ± 0% 21.6ns ± 0% +2.87% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% +0.09% (p=0.001 n=11+12)
LoadDescriptor_Upb 114µs ± 1% 115µs ± 1% +0.96% (p=0.000 n=11+12)
LoadDescriptor_Proto2 260µs ± 1% 261µs ± 1% +0.55% (p=0.033 n=12+12)
Parse_Upb_FileDesc_WithArena 10.8µs ± 0% 11.2µs ± 0% +3.43% (p=0.000 n=11+11)
Parse_Upb_FileDesc_WithInitialBlock 10.5µs ± 0% 10.9µs ± 0% +3.68% (p=0.000 n=12+12)
SerializeDescriptor_Proto2 5.25µs ± 3% 5.42µs ± 5% +3.39% (p=0.007 n=12+12)
SerializeDescriptor_Upb 12.0µs ± 0% 12.5µs ± 0% +4.14% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+12% +606 +12% +560 upb/decode.c
+7.9% +288 +7.9% +288 decode_msg
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-9.3% -16 -12.5% -16 decode_longvarint64
-25.5% -558 [ = ] 0 [Unmapped]
+0.0% +48 +0.4% +560 TOTAL
4 years ago
Joshua Haberman
a345af9883
Added a codegen parameter for whether fasttables are generated or not.
...
Example:
$ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=false
INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache.
INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured).
INFO: Found 1 target...
Target //benchmarks:benchmark up-to-date:
bazel-bin/benchmarks/benchmark
INFO: Elapsed time: 0.760s, Critical Path: 0.58s
INFO: 7 processes: 1 internal, 6 linux-sandbox.
INFO: Build completed successfully, 7 total actions
$ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithArena 10985 ns 10984 ns 63567 651.857MB/s
BM_Parse_Upb_FileDesc_WithInitialBlock 10556 ns 10554 ns 66138 678.458MB/s
$ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=true
INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache.
INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured).
INFO: Found 1 target...
Target //benchmarks:benchmark up-to-date:
bazel-bin/benchmarks/benchmark
INFO: Elapsed time: 0.744s, Critical Path: 0.58s
INFO: 7 processes: 1 internal, 6 linux-sandbox.
INFO: Build completed successfully, 7 total actions
$ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithArena 3284 ns 3284 ns 213495 2.1293GB/s
BM_Parse_Upb_FileDesc_WithInitialBlock 2882 ns 2882 ns 243069 2.4262GB/s
Biggest unknown is whether this parameter should default to true or false.
4 years ago
Joshua Haberman
8a3470c543
WIP.
4 years ago
Joshua Haberman
8e8dbb5258
Merge branch 'fastest-table' into fast-table
4 years ago
Joshua Haberman
7d17a0e8c5
Merge branch 'master' into fastest-table
4 years ago
Joshua Haberman
a7e2e8338d
Fixed benchmark script.
4 years ago
Joshua Haberman
72de7b7002
Merge branch 'fastest-table' into fast-table
4 years ago
Joshua Haberman
cb234e652c
Merge branch 'master' into fastest-table
4 years ago
Joshua Haberman
b86cf2d789
Merge pull request #323 from haberman/build-files
...
Split monolithic BUILD file into many build files.
4 years ago
Joshua Haberman
4ea81ab107
Fixed pedantic warning.
4 years ago
Joshua Haberman
6399b31f4b
Removed ULL constants in json_decode.c.
4 years ago
Joshua Haberman
c8ae197e64
Removed "U" suffixes, they are not necessary.
4 years ago
Joshua Haberman
bc1e0b314f
Fixed some strict C89 errors.
4 years ago
Joshua Haberman
2c1664906a
Removed license comments and upb_amalgamation for google3.
4 years ago
Joshua Haberman
b7dc77415a
Added licenses() to all BUILD files.
4 years ago
Joshua Haberman
de22764b33
Updated Kokoro to test ... instead of :all.
4 years ago
Joshua Haberman
e3f41de6c7
Split monolithic BUILD file into many build files.
4 years ago
Joshua Haberman
fbe2bcafbc
Merge pull request #4 from gerben-s/gerbens-fast-table
...
Add repeated varints and fixed parsers
4 years ago
gerben-s
9e68ec033f
Add repeated varints and fixed parsers
4 years ago
Joshua Haberman
d0e4b688c6
Shorten name of kAliasString, so benchmark results don't wrap.
4 years ago
Joshua Haberman
c0c9b5a168
Regenerated generated code.
4 years ago
Joshua Haberman
eb8e6de8b7
Regenerated source files.
4 years ago
Joshua Haberman
7f0d535826
Merge branch 'fastest-table' into fast-table
4 years ago
Joshua Haberman
bf8e08074c
Added a few more comments.
4 years ago
Joshua Haberman
6e3c22e6ee
Merge branch 'fastest-table' into fast-table
4 years ago
Joshua Haberman
3238821315
Gave fast table entry a nicer name.
4 years ago
Joshua Haberman
2a574d3d01
Added a bunch of comments for readability.
4 years ago
Joshua Haberman
0deca8b8fb
Merge branch 'master' into fast-table
4 years ago
Joshua Haberman
bfadc99709
Merge branch 'master' into fastest-table
4 years ago
Joshua Haberman
84e0f6127d
Merge branch 'master' into fastest-table
4 years ago