protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	16f763e4d6	Addressed PR comments.	3 years ago
Joshua Haberman	c755099a89	WIP.	3 years ago
Joshua Haberman	401e1747b5	Addressed PR feedback.	3 years ago
Joshua Haberman	cc03669a17	Several changes to defs. Biggest/key changes: 1. Defs are now nested per the .proto file syntax. 2. Options are parsed and vended.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	29be74c5d2	Addressed PR comments.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	6e53de4a03	Addressed PR comments.	4 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	fa4dfc2baa	Addressed PR comments.	4 years ago
Joshua Haberman	0fb61eaeb5	Refactored the codegen into smaller functions, in anticipation of extensions.	4 years ago
Joshua Haberman	807e7fe9e2	Fixed dense_below logic to be order-independent and consistent between def.c and codegen.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	3881393907	Renamed .int.h to _internal.h, for greater clarity.	4 years ago
Joshua Haberman	823eb09694	Update all 2011 dates to 2021.	4 years ago
Joshua Haberman	e59d2c8fa7	Added license headers to all files.	4 years ago
Joshua Haberman	1674f28dd7	Put public message interface into msg.h and moved internal functions to msg.int.h.	4 years ago
Joshua Haberman	7a54a5f3d6	Split the code generators for .upb and .upbdefs. Before there was a single code generator that generated both .upb and .upbdefs, even though they are generated by different rules. This worked fine as long as the codegen steps were sandboxed, but if not it led to build errors. Fixes https://github.com/protocolbuffers/upb/issues/354.	4 years ago
Joshua Haberman	65d166a6ba	Added API for copy vs. alias and added benchmarks to test both. Benchmark output: $ bazel-bin/benchmarks/benchmark '--benchmark_filter=BM_Parse' 2020-11-11 15:39:04 Running bazel-bin/benchmarks/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_Parse_Upb_FileDesc<UseArena, Copy> 4134 ns 4134 ns 168714 1.69152GB/s BM_Parse_Upb_FileDesc<UseArena, Alias> 3487 ns 3487 ns 199509 2.00526GB/s BM_Parse_Upb_FileDesc<InitBlock, Copy> 3727 ns 3726 ns 187581 1.87643GB/s BM_Parse_Upb_FileDesc<InitBlock, Alias> 3110 ns 3110 ns 224970 2.24866GB/s BM_Parse_Proto2<FileDesc, NoArena, Copy> 31132 ns 31132 ns 22437 229.995MB/s BM_Parse_Proto2<FileDesc, UseArena, Copy> 21011 ns 21009 ns 33922 340.812MB/s BM_Parse_Proto2<FileDesc, InitBlock, Copy> 17976 ns 17975 ns 38808 398.337MB/s BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 17357 ns 17356 ns 40244 412.539MB/s	4 years ago
Joshua Haberman	a01f3e23a4	Fixes for google3 build, and exclude even more tests from macOS to avoid timeout.	4 years ago
Joshua Haberman	154f2c25f4	Added UTF-8 validation for proto3 string fields.	4 years ago
Joshua Haberman	e8f9eac68c	Added #defines UPB_ENABLE_FASTTABLE and UPB_TRY_ENABLE_FASTTABLE. These control whether fasttable decoding is on.	4 years ago
Joshua Haberman	bd9f8f580d	Fixed a few bugs with the fast decoder. 1. For long tags we were putting table entries in the wrong slot. 2. For repeated strings, when the buffer flipped to no longer alias we were failing to notice and kept aliasing anyway.	4 years ago
Joshua Haberman	3eba47914b	Allocate hasbits and table slots in "hotness" order. Without a profile, we assume that fields with smaller numbers are hotter.	4 years ago
Joshua Haberman	021db6fcd5	Allow larger tags into the table if they are unique mod 31. Also fixed a bug with fixed packed in decode_fast.c.	4 years ago
Joshua Haberman	86d9908c55	Fastdecode support for packed fields. This is not very optimized yet. There is a lot of room to optimize it further.	4 years ago
Joshua Haberman	e3e797b680	Added fasttable support for oneofs.	4 years ago
Joshua Haberman	e2c709e047	Repeated string and primitive support. Much of the code was adapted from Gerben's code in: `6333031195`	4 years ago
Joshua Haberman	a345af9883	Added a codegen parameter for whether fasttables are generated or not. Example: $ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=false INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache. INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured). INFO: Found 1 target... Target //benchmarks:benchmark up-to-date: bazel-bin/benchmarks/benchmark INFO: Elapsed time: 0.760s, Critical Path: 0.58s INFO: 7 processes: 1 internal, 6 linux-sandbox. INFO: Build completed successfully, 7 total actions $ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithArena 10985 ns 10984 ns 63567 651.857MB/s BM_Parse_Upb_FileDesc_WithInitialBlock 10556 ns 10554 ns 66138 678.458MB/s $ CC=clang bazel build -c opt --copt=-g benchmarks:benchmark --//:fasttable_enabled=true INFO: Build option --//:fasttable_enabled has changed, discarding analysis cache. INFO: Analyzed target //benchmarks:benchmark (0 packages loaded, 913 targets configured). INFO: Found 1 target... Target //benchmarks:benchmark up-to-date: bazel-bin/benchmarks/benchmark INFO: Elapsed time: 0.744s, Critical Path: 0.58s INFO: 7 processes: 1 internal, 6 linux-sandbox. INFO: Build completed successfully, 7 total actions $ bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithArena 3284 ns 3284 ns 213495 2.1293GB/s BM_Parse_Upb_FileDesc_WithInitialBlock 2882 ns 2882 ns 243069 2.4262GB/s Biggest unknown is whether this parameter should default to true or false.	4 years ago
gerben-s	9e68ec033f	Add repeated varints and fixed parsers	4 years ago
Joshua Haberman	ded2e657a7	Added compatibility with old generated code. Until everyone can regenerate their code, we need to provide compatible semantics with the old generated code. Also fixed a bug where enums were allocated 8 bytes instead of 4.	4 years ago
Joshua Haberman	75edd3e59c	Changed to use table pairs, seems to ever-so-slightly regress.	4 years ago
Joshua Haberman	bca7edac8c	Cleaned up table compression a bit.	4 years ago
Joshua Haberman	a6dc88556d	Tables are compressed, but perf goes down to 2.44GB/s.	4 years ago
Joshua Haberman	a4966fd230	Added a few extra sanity checks.	4 years ago
Joshua Haberman	99acbe0da8	Fixed bug where submsg array could have excess elements. Before we were allocating an array element for every sub-message field, even if two different fields had messages of the same type.	4 years ago
Gerben Stavenga	3f719fa6b2	Bugfix: offsetting hasbits with 16 introduced a bug in calculating hasmasks. Removing extra <<16 shift in hasmask calculating and masking out the first 16 bits. This makes messages without hasbits work as well.	4 years ago
Gerben Stavenga	4053805759	Bugfixes	4 years ago
Joshua Haberman	71749b7caf	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Joshua Haberman	9557b97acc	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Gerben Stavenga	36662b3735	Refactor some code. I extracted some common code from all message field parsers, to a tail recursive function. Removed the varint jmp table for a simple varint parse loop, that removes the stack frames. Also careful with not losing information in repeated message tag check. When written mindful the checks and loads that happen can be reused for tag dispatch if not the expected tag.	4 years ago
Joshua Haberman	9938cf8f27	Put submsg_index directly in table data. Drop oneof support for now to focus.	4 years ago
Joshua Haberman	526e430794	I think this may have reached the optimization limit. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_ArenaOneAlloc 21 ns 21 ns 32994231 BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005 BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s $ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap 2020-10-08 14:07:06 Running bazel-bin/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap': 1,050.22 msec task-clock # 0.978 CPUs utilized 4 context-switches # 0.004 K/sec 0 cpu-migrations # 0.000 K/sec 179 page-faults # 0.170 K/sec 3,875,796,334 cycles # 3.690 GHz 13,282,835,967 instructions # 3.43 insn per cycle 2,887,725,848 branches # 2749.627 M/sec 8,324,912 branch-misses # 0.29% of all branches 1.073924364 seconds time elapsed 1.042806000 seconds user 0.008021000 seconds sys Profile: 23.96% benchmark benchmark [.] upb_prm_1bt_max192b 22.44% benchmark benchmark [.] fastdecode_dispatch 18.96% benchmark benchmark [.] upb_pss_1bt 14.20% benchmark benchmark [.] upb_psv4_1bt 8.33% benchmark benchmark [.] upb_prm_1bt_max64b 6.66% benchmark benchmark [.] upb_prm_1bt_max128b 1.29% benchmark benchmark [.] upb_psm_1bt_max64b 0.77% benchmark benchmark [.] fastdecode_generic 0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single 0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.42% benchmark benchmark [.] upb_psm_1bt_max256b 0.31% benchmark benchmark [.] upb_psb1_1bt 0.21% benchmark benchmark [.] upb_plv4_5bv 0.14% benchmark benchmark [.] upb_psb1_2bt 0.12% benchmark benchmark [.] decode_longvarint64 0.08% benchmark [kernel.kallsyms] [k] vsnprintf 0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock 0.07% benchmark benchmark [.] _upb_msg_new 0.06% benchmark ld-2.31.so [.] check_match	4 years ago
Joshua Haberman	52a0ed3891	Fixed a bug with tag number 15.	4 years ago
Joshua Haberman	9e5c5ce089	Optimized memset() with cutoff and fixed group & unknown message bugs.	4 years ago
Joshua Haberman	88b1ec7784	Table-driven supports repeated sub-messages.	4 years ago
Joshua Haberman	f173642db4	Handle non-repeated submessages.	4 years ago

22 Commits (16f763e4d69fef36be8a90d80e6401f211061a80)