protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	a04627abc8	Added map sorting to binary and text encoders. For the binary encoder, sorting is off by default. For the text encoder, sorting is on by default. Both defaults can be explicitly overridden. This grows code size a bit. I think we could potentially shave this (and other map-related code size) by having the generated code inject a function pointer to the map-related parsing/serialization code if maps are present. FILE SIZE VM SIZE -------------- -------------- +86% +1.07Ki +71% +768 upb/msg.c [NEW] +391 [NEW] +344 _upb_mapsorter_pushmap [NEW] +158 [NEW] +112 _upb_mapsorter_cmpstr [NEW] +111 [NEW] +64 _upb_mapsorter_cmpbool [NEW] +110 [NEW] +64 _upb_mapsorter_cmpi32 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpi64 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpu32 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpu64 -3.6% -8 -4.3% -8 _upb_map_new +9.5% +464 +9.2% +424 upb/text_encode.c [NEW] +656 [NEW] +616 txtenc_mapentry +15% +32 +20% +32 upb_text_encode -20.1% -224 -20.7% -224 txtenc_msg +5.7% +342 +5.3% +296 upb/encode.c [NEW] +344 [NEW] +304 encode_mapentry [NEW] +246 [NEW] +208 upb_encode_ex [NEW] +41 [NEW] +16 upb_encode_ex.ch +0.7% +8 +0.7% +8 encode_scalar -1.0% -32 -1.0% -32 encode_message [DEL] -38 [DEL] -16 upb_encode.ch [DEL] -227 [DEL] -192 upb_encode +2.0% +152 +2.2% +152 upb/decode.c +44% +128 +44% +128 [section .rodata] +3.4% +24 +3.4% +24 _GLOBAL_OFFSET_TABLE_ +0.6% +107 +0.3% +48 upb/def.c [NEW] +100 [NEW] +48 upb_fielddef_descriptortype +7.1% +7 [ = ] 0 upb_fielddef_defaultint32 +2.9% +24 +2.9% +24 [section .dynsym] +1.2% +24 [ = ] 0 [section .symtab] +3.2% +16 +3.2% +16 [section .plt] [NEW] +16 [NEW] +16 memcmp@plt +0.5% +16 +0.6% +16 tests/conformance_upb.c +1.5% +16 +1.6% +16 DoTestIo +0.1% +16 +0.1% +16 upb/json_decode.c +0.4% +16 +0.4% +16 jsondec_wellknown +3.0% +8 +3.0% +8 [section .got.plt] +3.0% +8 +3.0% +8 _GLOBAL_OFFSET_TABLE_ +1.6% +7 +1.6% +7 [section .dynstr] +1.8% +4 +1.8% +4 [section .hash] +0.5% +3 +0.5% +3 [LOAD #2 [RX]] +2.8% +2 +2.8% +2 [section .gnu.version] -60.0% -1.74Ki [ = ] 0 [Unmapped] +0.3% +496 +1.4% +1.74Ki TOTAL	4 years ago
Joshua Haberman	9abf8e043f	Clamp 32-bit varints to 5 bytes to fix a fuzz failure.	4 years ago
Joshua Haberman	358fa14d0e	Fixed headers and updated benchmark script.	4 years ago
Joshua Haberman	e5bdfba92c	Removed accidentally-added .orig file.	4 years ago
Joshua Haberman	154f2c25f4	Added UTF-8 validation for proto3 string fields.	4 years ago
Joshua Haberman	d81ba58215	Optimized short string copying. This sped up the alias=false case: Before: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,216.65 msec task-clock # 0.936 CPUs utilized 6 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 200 page-faults # 0.164 K/sec 4,490,925,650 cycles # 3.691 GHz 16,516,403,731 instructions # 3.68 insn per cycle 2,828,536,650 branches # 2324.861 M/sec 5,425,830 branch-misses # 0.19% of all branches 1.300178903 seconds time elapsed 1.211475000 seconds user 0.072207000 seconds sys After: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,109.69 msec task-clock # 0.930 CPUs utilized 5 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 198 page-faults # 0.178 K/sec 4,094,010,257 cycles # 3.689 GHz 15,672,677,812 instructions # 3.83 insn per cycle 2,589,291,160 branches # 2333.346 M/sec 3,306,386 branch-misses # 0.13% of all branches 1.193221789 seconds time elapsed 1.102538000 seconds user 0.072166000 seconds sys	4 years ago
Joshua Haberman	a7e2e8338d	Fixed benchmark script.	4 years ago
Joshua Haberman	e3f41de6c7	Split monolithic BUILD file into many build files.	4 years ago
Joshua Haberman	1c8c16b9b1	Use quoted include.	4 years ago
Joshua Haberman	71749b7caf	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Joshua Haberman	9557b97acc	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Joshua Haberman	b58d2a0ee6	Shrink overhead of message representation.	4 years ago
Joshua Haberman	0bf063a2ca	Shrink overhead of message representation.	4 years ago
Joshua Haberman	526e430794	I think this may have reached the optimization limit. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_ArenaOneAlloc 21 ns 21 ns 32994231 BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005 BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s $ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap 2020-10-08 14:07:06 Running bazel-bin/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap': 1,050.22 msec task-clock # 0.978 CPUs utilized 4 context-switches # 0.004 K/sec 0 cpu-migrations # 0.000 K/sec 179 page-faults # 0.170 K/sec 3,875,796,334 cycles # 3.690 GHz 13,282,835,967 instructions # 3.43 insn per cycle 2,887,725,848 branches # 2749.627 M/sec 8,324,912 branch-misses # 0.29% of all branches 1.073924364 seconds time elapsed 1.042806000 seconds user 0.008021000 seconds sys Profile: 23.96% benchmark benchmark [.] upb_prm_1bt_max192b 22.44% benchmark benchmark [.] fastdecode_dispatch 18.96% benchmark benchmark [.] upb_pss_1bt 14.20% benchmark benchmark [.] upb_psv4_1bt 8.33% benchmark benchmark [.] upb_prm_1bt_max64b 6.66% benchmark benchmark [.] upb_prm_1bt_max128b 1.29% benchmark benchmark [.] upb_psm_1bt_max64b 0.77% benchmark benchmark [.] fastdecode_generic 0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single 0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.42% benchmark benchmark [.] upb_psm_1bt_max256b 0.31% benchmark benchmark [.] upb_psb1_1bt 0.21% benchmark benchmark [.] upb_plv4_5bv 0.14% benchmark benchmark [.] upb_psb1_2bt 0.12% benchmark benchmark [.] decode_longvarint64 0.08% benchmark [kernel.kallsyms] [k] vsnprintf 0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock 0.07% benchmark benchmark [.] _upb_msg_new 0.06% benchmark ld-2.31.so [.] check_match	4 years ago
Joshua Haberman	7ec2c52346	Donate/steal from arena to accelerate decoding.	4 years ago
Joshua Haberman	fac992db83	Cleanup for showing.	4 years ago
Joshua Haberman	438ecaeb5a	Give all field parsers a generic table entry.	4 years ago
Joshua Haberman	a202ce9629	Add UPB_FORCEINLINE for varint32 decoding. This speeds up the decoder by >20% and also reduces code size slightly! name old time/op new time/op delta ArenaOneAlloc 20.4ns ± 0% 20.2ns ± 0% -1.10% (p=0.000 n=12+11) ArenaInitialBlockOneAlloc 5.25ns ± 0% 5.25ns ± 0% ~ (p=0.786 n=11+12) ParseDescriptorNoHeap 17.1µs ± 0% 13.1µs ± 0% -23.29% (p=0.000 n=11+12) ParseDescriptor 17.4µs ± 1% 13.5µs ± 1% -22.51% (p=0.000 n=12+12) SerializeDescriptor 10.7µs ± 0% 10.9µs ± 0% +1.95% (p=0.000 n=12+12) FILE SIZE VM SIZE -------------- -------------- +2.7% +16 +2.7% +16 [LOAD #2 [RX]] +0.5% +16 [ = ] 0 [Unmapped] -1.4% -72 -0.7% -32 upb/decode.c +3.1% +98 +3.1% +98 decode_msg [DEL] -170 [DEL] -130 decode_varint32 -0.0% -40 -0.0% -16 TOTAL	4 years ago
Joshua Haberman	5741eb9ad7	Expanded benchmarking script and added one size opt to the encoder.	4 years ago
Joshua Haberman	9b31e8fe12	Merged common encode tag paths.	4 years ago
Joshua Haberman	08b6d2d6fd	Rewrite of the decoder (#263 ) New code is smaller (in both source size and compiled size) and faster. # Speed The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs. ### Linux Desktop ``` CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz OS: Linux name old time/op new time/op delta CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11) ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11) ``` ### Mac Laptop ``` CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz OS: macOS name old time/op new time/op delta CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12) ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12) ``` ### Linux Workstation ``` CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz OS: Linux name old time/op new time/op delta CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12) ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12) ``` # Size A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux: ``` VM SIZE -------------- +2.1% +283 upb/json_decode.c +24% +205 upb/msg.c +8.4% +115 upb/upb.c +0.9% +28 upb/reflection.c [ = ] 0 upb/def.c [ = ] 0 upb/encode.c [ = ] 0 upb/json_encode.c [ = ] 0 upb/table.c -30.3% -1.51Ki upb/decode.c -0.7% -738 TOTAL ```	5 years ago

6 Commits (794ce6d06188c6df290b4b46b4c32e95a219eafc)