Gerben Stavenga
4053805759
Bugfixes
4 years ago
Gerben Stavenga
36662b3735
Refactor some code. I extracted some common code from all message field
...
parsers, to a tail recursive function. Removed the varint jmp table for
a simple varint parse loop, that removes the stack frames. Also careful
with not losing information in repeated message tag check. When written
mindful the checks and loads that happen can be reused for tag dispatch
if not the expected tag.
4 years ago
Joshua Haberman
9938cf8f27
Put submsg_index directly in table data. Drop oneof support for now to focus.
4 years ago
Joshua Haberman
d87179501d
Another build fix.
4 years ago
Joshua Haberman
89bd8b87e1
Fixed a few more C89 compat issues.
4 years ago
Joshua Haberman
64d293894a
Fixed bug introduced by last optimization.
4 years ago
Joshua Haberman
ff957b996c
Fixed C89 compat issues.
4 years ago
Joshua Haberman
537b6f42c2
A few updates to the benchamrk and minor implementation changes.
4 years ago
Joshua Haberman
0dcc5641eb
Replicated dispatch and implemeted array resizing logic. Up to 2.67GB/s.
4 years ago
Joshua Haberman
526e430794
I think this may have reached the optimization limit.
...
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_ArenaOneAlloc 21 ns 21 ns 32994231
BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005
BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s
BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s
BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s
BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s
BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s
BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s
$ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap
2020-10-08 14:07:06
Running bazel-bin/benchmark
Run on (72 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32K (x36)
L1 Instruction 32K (x36)
L2 Unified 1024K (x36)
L3 Unified 25344K (x2)
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s
Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap':
1,050.22 msec task-clock # 0.978 CPUs utilized
4 context-switches # 0.004 K/sec
0 cpu-migrations # 0.000 K/sec
179 page-faults # 0.170 K/sec
3,875,796,334 cycles # 3.690 GHz
13,282,835,967 instructions # 3.43 insn per cycle
2,887,725,848 branches # 2749.627 M/sec
8,324,912 branch-misses # 0.29% of all branches
1.073924364 seconds time elapsed
1.042806000 seconds user
0.008021000 seconds sys
Profile:
23.96% benchmark benchmark [.] upb_prm_1bt_max192b
22.44% benchmark benchmark [.] fastdecode_dispatch
18.96% benchmark benchmark [.] upb_pss_1bt
14.20% benchmark benchmark [.] upb_psv4_1bt
8.33% benchmark benchmark [.] upb_prm_1bt_max64b
6.66% benchmark benchmark [.] upb_prm_1bt_max128b
1.29% benchmark benchmark [.] upb_psm_1bt_max64b
0.77% benchmark benchmark [.] fastdecode_generic
0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single
0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.42% benchmark benchmark [.] upb_psm_1bt_max256b
0.31% benchmark benchmark [.] upb_psb1_1bt
0.21% benchmark benchmark [.] upb_plv4_5bv
0.14% benchmark benchmark [.] upb_psb1_2bt
0.12% benchmark benchmark [.] decode_longvarint64
0.08% benchmark [kernel.kallsyms] [k] vsnprintf
0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock
0.07% benchmark benchmark [.] _upb_msg_new
0.06% benchmark ld-2.31.so [.] check_match
4 years ago
Joshua Haberman
4c65b25daf
Handle long varints, now 2GB/s!
4 years ago
Joshua Haberman
e39ec95ca2
Hoisted updates to limits and depth out of the loop.
4 years ago
Joshua Haberman
52a0ed3891
Fixed a bug with tag number 15.
4 years ago
Joshua Haberman
388b6f64eb
A small optimization: don't increment array length every iteration.
4 years ago
Joshua Haberman
9e5c5ce089
Optimized memset() with cutoff and fixed group & unknown message bugs.
4 years ago
Joshua Haberman
8dd7b5a2ca
A bunch more optimization.
4 years ago
Joshua Haberman
e46e94ec7f
Added benchmarks for proto2.
4 years ago
Joshua Haberman
405e7934b1
Handle 2-byte submessage lengths.
4 years ago
Joshua Haberman
88b1ec7784
Table-driven supports repeated sub-messages.
4 years ago
Joshua Haberman
f173642db4
Handle non-repeated submessages.
4 years ago
Joshua Haberman
e219a2d91d
Merge branch 'decode-arena' into fast-table
4 years ago
Joshua Haberman
7ec2c52346
Donate/steal from arena to accelerate decoding.
4 years ago
Joshua Haberman
d43ccfa079
Revert test changes.
4 years ago
Joshua Haberman
fac992db83
Cleanup for showing.
4 years ago
Joshua Haberman
3937874a85
We have a properly structured algorithm, but perf regresses by 20%.
4 years ago
Joshua Haberman
438ecaeb5a
Give all field parsers a generic table entry.
4 years ago
Joshua Haberman
383ae5293e
WIP.
4 years ago
Joshua Haberman
26abaa2345
WIP.
4 years ago
Joshua Haberman
34b98bc030
Avoid passing too many params to fallback.
4 years ago
Joshua Haberman
763a3f6293
WIP.
4 years ago
Joshua Haberman
02ff6fb996
Merge pull request #309 from haberman/decoder-forceinline
...
Add UPB_FORCEINLINE for varint32 decoding.
4 years ago
Joshua Haberman
a202ce9629
Add UPB_FORCEINLINE for varint32 decoding.
...
This speeds up the decoder by >20% and also reduces code size slightly!
name old time/op new time/op delta
ArenaOneAlloc 20.4ns ± 0% 20.2ns ± 0% -1.10% (p=0.000 n=12+11)
ArenaInitialBlockOneAlloc 5.25ns ± 0% 5.25ns ± 0% ~ (p=0.786 n=11+12)
ParseDescriptorNoHeap 17.1µs ± 0% 13.1µs ± 0% -23.29% (p=0.000 n=11+12)
ParseDescriptor 17.4µs ± 1% 13.5µs ± 1% -22.51% (p=0.000 n=12+12)
SerializeDescriptor 10.7µs ± 0% 10.9µs ± 0% +1.95% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+2.7% +16 +2.7% +16 [LOAD #2 [RX]]
+0.5% +16 [ = ] 0 [Unmapped]
-1.4% -72 -0.7% -32 upb/decode.c
+3.1% +98 +3.1% +98 decode_msg
[DEL] -170 [DEL] -130 decode_varint32
-0.0% -40 -0.0% -16 TOTAL
4 years ago
Joshua Haberman
d0f2c4c8a2
Merge pull request #308 from haberman/encoder
...
Optimized the binary encoder for a 2x speedup
4 years ago
Joshua Haberman
5741eb9ad7
Expanded benchmarking script and added one size opt to the encoder.
4 years ago
Joshua Haberman
0135399e60
Fixed bug introduced in refactoring.
4 years ago
Joshua Haberman
df3438222b
Notated impossible branch as unreachable.
4 years ago
Joshua Haberman
9b31e8fe12
Merged common encode tag paths.
4 years ago
Joshua Haberman
5d7dc718cc
Minor formatting fix.
4 years ago
Joshua Haberman
80441e4eb4
Optimized binary encoder.
4 years ago
Joshua Haberman
ada28896b9
Changed encoder to use longjmp() for error recovery.
4 years ago
Joshua Haberman
6e140c267c
Added benchmark for encoding.
4 years ago
Joshua Haberman
7338facddb
Merge pull request #307 from veblush/port-backport
...
Add UPB_NORETURN for MSC
4 years ago
Esun Kim
4d2251c3e4
Add UPB_NORETURN for MSC
4 years ago
Joshua Haberman
382d5afc60
Merge pull request #306 from haberman/bigendian
...
Fixed binary encoding and decoding for big-endian machines.
4 years ago
Joshua Haberman
efefbffc80
Fixed binary encoding and decoding for big-endian machines.
4 years ago
Joshua Haberman
5d3083013c
Merge pull request #304 from haberman/upb-assume
...
Fixed UPB_ASSUME() for non-GCC, non-MSVC platforms.
4 years ago
Joshua Haberman
55dd9d3e41
Fixed UPB_ASSUME() for non-GCC, non-MSVC platforms.
4 years ago
Joshua Haberman
e4c8afd0d4
Merge pull request #303 from haberman/packed-def
...
Fixed upb_fielddef_packed() to have the correct default.
4 years ago
Joshua Haberman
8284321780
Fixed upb_fielddef_packed() to have the correct default.
4 years ago
Joshua Haberman
ed86d98f53
Merge pull request #302 from haberman/verify-utf8
...
Verify UTF-8 when parsing proto3 string fields.
4 years ago