Joshua Haberman
|
52957fa984
|
Merge pull request #312 from haberman/defiter
Added simple offset-based accessors for defs, and deprecated old iterators
|
4 years ago |
Joshua Haberman
|
746f64692c
|
Moved arena inline for decoder.
|
4 years ago |
Joshua Haberman
|
7363b91ac3
|
Moved arena inline for decoder.
|
4 years ago |
Joshua Haberman
|
b8ef1dcc57
|
Removed C++-style comments.
|
4 years ago |
Joshua Haberman
|
575acd85bd
|
Re-added const for all of the pointer wrapper types.
|
4 years ago |
Joshua Haberman
|
5aa5b77b41
|
Added simple offset-based accessors for defs, and deprecated old iterators.
|
4 years ago |
Joshua Haberman
|
33384301e2
|
Merge pull request #311 from haberman/proto2-benchmark2
Benchmarks vs. proto2
|
4 years ago |
Joshua Haberman
|
578e7c1f4c
|
Merge pull request #311 from haberman/proto2-benchmark2
Benchmarks vs. proto2
|
4 years ago |
Joshua Haberman
|
bc301e7da4
|
Use merge/partial variants to give proto2 benchmark the fairest hearing.
|
4 years ago |
Joshua Haberman
|
30f01afa83
|
Added LargeInitialBlock test for proto2.
|
4 years ago |
Joshua Haberman
|
5d23fd99af
|
Used shorter protobuf:: namespace alias.
|
4 years ago |
Joshua Haberman
|
9938cf8f27
|
Put submsg_index directly in table data. Drop oneof support for now to focus.
|
4 years ago |
Joshua Haberman
|
d87179501d
|
Another build fix.
|
4 years ago |
Joshua Haberman
|
89bd8b87e1
|
Fixed a few more C89 compat issues.
|
4 years ago |
Joshua Haberman
|
64d293894a
|
Fixed bug introduced by last optimization.
|
4 years ago |
Joshua Haberman
|
ff957b996c
|
Fixed C89 compat issues.
|
4 years ago |
Joshua Haberman
|
537b6f42c2
|
A few updates to the benchamrk and minor implementation changes.
|
4 years ago |
Joshua Haberman
|
0dcc5641eb
|
Replicated dispatch and implemeted array resizing logic. Up to 2.67GB/s.
|
4 years ago |
Joshua Haberman
|
526e430794
|
I think this may have reached the optimization limit.
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_ArenaOneAlloc 21 ns 21 ns 32994231
BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005
BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s
BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s
BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s
BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s
BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s
BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s
$ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap
2020-10-08 14:07:06
Running bazel-bin/benchmark
Run on (72 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32K (x36)
L1 Instruction 32K (x36)
L2 Unified 1024K (x36)
L3 Unified 25344K (x2)
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s
Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap':
1,050.22 msec task-clock # 0.978 CPUs utilized
4 context-switches # 0.004 K/sec
0 cpu-migrations # 0.000 K/sec
179 page-faults # 0.170 K/sec
3,875,796,334 cycles # 3.690 GHz
13,282,835,967 instructions # 3.43 insn per cycle
2,887,725,848 branches # 2749.627 M/sec
8,324,912 branch-misses # 0.29% of all branches
1.073924364 seconds time elapsed
1.042806000 seconds user
0.008021000 seconds sys
Profile:
23.96% benchmark benchmark [.] upb_prm_1bt_max192b
22.44% benchmark benchmark [.] fastdecode_dispatch
18.96% benchmark benchmark [.] upb_pss_1bt
14.20% benchmark benchmark [.] upb_psv4_1bt
8.33% benchmark benchmark [.] upb_prm_1bt_max64b
6.66% benchmark benchmark [.] upb_prm_1bt_max128b
1.29% benchmark benchmark [.] upb_psm_1bt_max64b
0.77% benchmark benchmark [.] fastdecode_generic
0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single
0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.42% benchmark benchmark [.] upb_psm_1bt_max256b
0.31% benchmark benchmark [.] upb_psb1_1bt
0.21% benchmark benchmark [.] upb_plv4_5bv
0.14% benchmark benchmark [.] upb_psb1_2bt
0.12% benchmark benchmark [.] decode_longvarint64
0.08% benchmark [kernel.kallsyms] [k] vsnprintf
0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock
0.07% benchmark benchmark [.] _upb_msg_new
0.06% benchmark ld-2.31.so [.] check_match
|
4 years ago |
Joshua Haberman
|
4c65b25daf
|
Handle long varints, now 2GB/s!
|
4 years ago |
Joshua Haberman
|
e39ec95ca2
|
Hoisted updates to limits and depth out of the loop.
|
4 years ago |
Joshua Haberman
|
52a0ed3891
|
Fixed a bug with tag number 15.
|
4 years ago |
Joshua Haberman
|
388b6f64eb
|
A small optimization: don't increment array length every iteration.
|
4 years ago |
Joshua Haberman
|
9e5c5ce089
|
Optimized memset() with cutoff and fixed group & unknown message bugs.
|
4 years ago |
Joshua Haberman
|
8dd7b5a2ca
|
A bunch more optimization.
|
4 years ago |
Joshua Haberman
|
e46e94ec7f
|
Added benchmarks for proto2.
|
4 years ago |
Joshua Haberman
|
405e7934b1
|
Handle 2-byte submessage lengths.
|
4 years ago |
Joshua Haberman
|
88b1ec7784
|
Table-driven supports repeated sub-messages.
|
4 years ago |
Joshua Haberman
|
f173642db4
|
Handle non-repeated submessages.
|
4 years ago |
Joshua Haberman
|
e219a2d91d
|
Merge branch 'decode-arena' into fast-table
|
4 years ago |
Joshua Haberman
|
7ec2c52346
|
Donate/steal from arena to accelerate decoding.
|
4 years ago |
Joshua Haberman
|
d43ccfa079
|
Revert test changes.
|
4 years ago |
Joshua Haberman
|
fac992db83
|
Cleanup for showing.
|
4 years ago |
Joshua Haberman
|
3937874a85
|
We have a properly structured algorithm, but perf regresses by 20%.
|
4 years ago |
Joshua Haberman
|
438ecaeb5a
|
Give all field parsers a generic table entry.
|
4 years ago |
Joshua Haberman
|
383ae5293e
|
WIP.
|
4 years ago |
Joshua Haberman
|
26abaa2345
|
WIP.
|
4 years ago |
Joshua Haberman
|
34b98bc030
|
Avoid passing too many params to fallback.
|
4 years ago |
Joshua Haberman
|
763a3f6293
|
WIP.
|
4 years ago |
Joshua Haberman
|
02ff6fb996
|
Merge pull request #309 from haberman/decoder-forceinline
Add UPB_FORCEINLINE for varint32 decoding.
|
4 years ago |
Joshua Haberman
|
a202ce9629
|
Add UPB_FORCEINLINE for varint32 decoding.
This speeds up the decoder by >20% and also reduces code size slightly!
name old time/op new time/op delta
ArenaOneAlloc 20.4ns ± 0% 20.2ns ± 0% -1.10% (p=0.000 n=12+11)
ArenaInitialBlockOneAlloc 5.25ns ± 0% 5.25ns ± 0% ~ (p=0.786 n=11+12)
ParseDescriptorNoHeap 17.1µs ± 0% 13.1µs ± 0% -23.29% (p=0.000 n=11+12)
ParseDescriptor 17.4µs ± 1% 13.5µs ± 1% -22.51% (p=0.000 n=12+12)
SerializeDescriptor 10.7µs ± 0% 10.9µs ± 0% +1.95% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+2.7% +16 +2.7% +16 [LOAD #2 [RX]]
+0.5% +16 [ = ] 0 [Unmapped]
-1.4% -72 -0.7% -32 upb/decode.c
+3.1% +98 +3.1% +98 decode_msg
[DEL] -170 [DEL] -130 decode_varint32
-0.0% -40 -0.0% -16 TOTAL
|
4 years ago |
Joshua Haberman
|
d0f2c4c8a2
|
Merge pull request #308 from haberman/encoder
Optimized the binary encoder for a 2x speedup
|
4 years ago |
Joshua Haberman
|
5741eb9ad7
|
Expanded benchmarking script and added one size opt to the encoder.
|
4 years ago |
Joshua Haberman
|
0135399e60
|
Fixed bug introduced in refactoring.
|
4 years ago |
Joshua Haberman
|
df3438222b
|
Notated impossible branch as unreachable.
|
4 years ago |
Joshua Haberman
|
9b31e8fe12
|
Merged common encode tag paths.
|
4 years ago |
Joshua Haberman
|
5d7dc718cc
|
Minor formatting fix.
|
4 years ago |
Joshua Haberman
|
80441e4eb4
|
Optimized binary encoder.
|
4 years ago |
Joshua Haberman
|
ada28896b9
|
Changed encoder to use longjmp() for error recovery.
|
4 years ago |
Joshua Haberman
|
6e140c267c
|
Added benchmark for encoding.
|
4 years ago |