protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	2484d12c1c	Addressed PR comments.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	3366d02f04	Addressed PR comments.	3 years ago
Joshua Haberman	5c28ab6b2c	Implemented upb_enumvaldef, for storing information about enumvals.	3 years ago
Joshua Haberman	53fba823de	Added missing upb_symtab_lookupext() function.	3 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	807e7fe9e2	Fixed dense_below logic to be order-independent and consistent between def.c and codegen.	4 years ago
Joshua Haberman	2e8a122fc0	Changed dense_below calculation to use UINT8_MAX as the constant.	4 years ago
Joshua Haberman	6394894b6e	Addressed PR comments.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	9482957425	Enforce that filenames are unique when loaded into symtab. This brings upb into line with C++. PHP already checks this internally, so this should not be an issue there. Ruby on the other hand does not currently check this, so this change will cause our Ruby implementation to reject some programs that would otherwise have been accepted.	4 years ago
Joshua Haberman	823eb09694	Update all 2011 dates to 2021.	4 years ago
Joshua Haberman	e59d2c8fa7	Added license headers to all files.	4 years ago
Joshua Haberman	83c0edbd2a	A few minor cleanups.	4 years ago
Joshua Haberman	c358829c76	Now that handlers are gone, cleaned up table to use arenas exclusively. Also cleaned up some cruft from table.	4 years ago
Joshua Haberman	ec9ba3f893	Fixed error message buffer overflow.	4 years ago
Joshua Haberman	f41c0ec261	Added an internal API to get arena from symtab, for Ruby's use.	4 years ago
Joshua Haberman	f5d2d55007	Deleted the legacy "Handlers" APIs. upb can finally be deserving of its name. This is possible now that all users have been migrated to the new upb_msg APIs.	4 years ago
Joshua Haberman	c7787cbaa1	Fixed a bunch of Clang warnings. Unfortunately a few of the Clang warnings did not have easy fixes: ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’: ../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn] 353 \| const char fastdecode_err(upb_decstate d) { \| ^~~~~~~~~~~~~~ ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’: ../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered] 867 \| bool _upb_decode(const char buf, size_t size, void msg, I even tried to suppress the first error, but it still shows up.	4 years ago
Joshua Haberman	5e550e88f8	Added API for getting fielddef default as a upb_msgval.	4 years ago
Joshua Haberman	ee49a8d7df	Added an accessor to get the symtab from a filedef. This matches an API already present in proto2 (const DescriptorPool* FileDescriptor::pool()). However there is a slightly subtle implication here. In proto2, the relationship between Descriptor and MessageFactory is 1:many. You can create as many DynamicMessageFactory instances as you want, and each one will have its own independent DynamicMessage prototype and computed layout for the same underlying Descriptor. In practice the layouts will all be the same, but one thing that could be distinct is that each can have its own extension pool, which is a DescriptorPool that will be searched for extensions when parsing. In contrast, upb does not have a separate "message factory" abstraction. That means that each upb_msgdef has a single distinct layout, in other words a 1:1 correspondence between descriptor and layout. This means that there is no way to create multiple message types for the same descriptor that have distinct extension pools. If you want a different set of extensions, you must create a separate upb_symtab with a distinct set of descriptors. This change further entrenches that upb_filedef:upb_symtab is a 1:1 relationship. A single upb_filedef cannot be a member of multiple symbol tables. In practice this was already true (there is no way to add a single filedef to multiple symbol tables) but this change codifies this 1:1 relationship.	4 years ago
Joshua Haberman	bc200451ce	Use a macro instead of an inline function for setjmp/longjmp.	4 years ago
Joshua Haberman	fbc0639b07	Use _setjmp on mac to avoid saving/restoring the signal mask.	4 years ago
Joshua Haberman	65d166a6ba	Added API for copy vs. alias and added benchmarks to test both. Benchmark output: $ bazel-bin/benchmarks/benchmark '--benchmark_filter=BM_Parse' 2020-11-11 15:39:04 Running bazel-bin/benchmarks/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_Parse_Upb_FileDesc<UseArena, Copy> 4134 ns 4134 ns 168714 1.69152GB/s BM_Parse_Upb_FileDesc<UseArena, Alias> 3487 ns 3487 ns 199509 2.00526GB/s BM_Parse_Upb_FileDesc<InitBlock, Copy> 3727 ns 3726 ns 187581 1.87643GB/s BM_Parse_Upb_FileDesc<InitBlock, Alias> 3110 ns 3110 ns 224970 2.24866GB/s BM_Parse_Proto2<FileDesc, NoArena, Copy> 31132 ns 31132 ns 22437 229.995MB/s BM_Parse_Proto2<FileDesc, UseArena, Copy> 21011 ns 21009 ns 33922 340.812MB/s BM_Parse_Proto2<FileDesc, InitBlock, Copy> 17976 ns 17975 ns 38808 398.337MB/s BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 17357 ns 17356 ns 40244 412.539MB/s	4 years ago
Joshua Haberman	5b1f0d86a1	For Kokoro, only build/test -m32 on Linux. Also fixed a bunch of bugs found by gcc's -fanalyzer.	4 years ago
Joshua Haberman	c9f9668234	symtab: use longjmp() for errors and avoid intermediate table. We used to use a separate "add table" during the upb_symtab_addfile() operation to make it easier to back out the file if it contained errors. But this created unnecessary work of re-adding the same symbols to the main symtab once everything was validated. Instead we directly add symbols to the main symbols table. If there is an error in validation, we remove precisely the set of symbols that were already added. This also requires using a separate arena for each file. We can fuse it with the symtab's main arena if the operation is successful. LoadDescriptor_Upb 61.2µs ± 4% 53.5µs ± 1% -12.50% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 4.43ms ± 1% 3.06ms ± 0% -31.00% (p=0.000 n=12+12) LoadDescriptor_Proto2 257µs ± 0% 259µs ± 0% +1.00% (p=0.000 n=12+12) LoadAdsDescriptor_Proto2 13.9ms ± 1% 13.9ms ± 1% ~ (p=0.128 n=12+12)	4 years ago
Joshua Haberman	c3b5637646	Added benchmark for loading ads descriptor. Generally this seems to track the speed of loading descriptor.proto. ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------- BM_LoadDescriptor_Upb 59091 ns 59086 ns 11747 121.182MB/s BM_LoadAdsDescriptor_Upb 4218587 ns 4218582 ns 166 120.544MB/s BM_LoadDescriptor_Proto2 241083 ns 241049 ns 2903 29.7043MB/s BM_LoadAdsDescriptor_Proto2 13442631 ns 13442099 ns 52 34.8975MB/s	4 years ago
Joshua Haberman	acd72c6d3f	WIP.	4 years ago
gerben-s	9e68ec033f	Add repeated varints and fixed parsers	4 years ago
Joshua Haberman	2a574d3d01	Added a bunch of comments for readability.	4 years ago
Joshua Haberman	6f59f1256e	Optimizations to descriptor loading.	4 years ago
Joshua Haberman	75edd3e59c	Changed to use table pairs, seems to ever-so-slightly regress.	4 years ago
Joshua Haberman	a6dc88556d	Tables are compressed, but perf goes down to 2.44GB/s.	4 years ago
Joshua Haberman	5aa5b77b41	Added simple offset-based accessors for defs, and deprecated old iterators.	4 years ago
Joshua Haberman	438ecaeb5a	Give all field parsers a generic table entry.	4 years ago
Joshua Haberman	8284321780	Fixed upb_fielddef_packed() to have the correct default.	4 years ago
Joshua Haberman	2c666bc8f6	Use C-style comment instead of C++.	4 years ago
Joshua Haberman	a77ea639d5	Verify UTF-8 when parsing proto3 string fields.	4 years ago
Joshua Haberman	e179dda212	Added initialization of all members to satisfy compiler warnings.	5 years ago
Joshua Haberman	81c2aa753e	Fixes for the PHP C Extension.	5 years ago
Joshua Haberman	b717575cef	Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors. (#289 ) * Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors. * Disable -Wshorten-32-to-64 since Kokoro is missing Clang. * Fixed -Wextra warnings for gcc. * Reordered UPB_UNUSED() to come after declarations. * Added another -pedantic fix and log CC version. * Fix compile error and conditionally run use_bazel.sh. * Moved set -e after use_bazel.sh. * Fixed typo in conditional.	5 years ago
Joshua Haberman	6b808a4072	Fixed all UBSan issues and added UBSan CI checks.	5 years ago
Joshua Haberman	543a0ce8f2	Fixes for PHP. (#286 ) - A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders. # Schema/Defs Changes - Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum. - Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type. # Decoder - Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage. # JSON Decoder - Allowed well-known types at the top level to have their special processing. - Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`. - Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0. # JSON Encoder - Allowed well-known types at the top level to have their special processing. - Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests. # Message / Reflection - Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set. - Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`. - Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm. # Compiler - Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.	5 years ago
Charlie Savage	93e2a40881	MSVC 2019 Fixes (#285 ) * resolvename is declared to return a bool value, but instead can return NULL. MSVC 2019 does not like that an throws a compile error. Fixed by returning false instead of NULL. * When compiling with MSVC 2019, the UPB_ASSUME macro expands out to: do {} if (false && (ok)) That isn't valid C code. Fixed by adding an elif for MSVC that uses __assume(0), which is similar to gcc's __builtin_unreachable according to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0627r0.pdf.	5 years ago
Paul Yang	55f5bcd62c	Add upb_symtab_lookupfile2 (#281 ) * Add upb_symtab_lookupfile2 Similar to upb_symtab_lookupfile but doesn't assume file name ends with '\0' * Fix	5 years ago
Joshua Haberman	2b1e7dc1cc	Arena refactor: moves cleanup list into regular blocks (#277 ) * WIP. * WIP. * Tests are passing. * Recover some perf: LIKELY doesn't propagate through functions. :( * Added some more benchmarks. * Simplify & optimize upb_arena_realloc(). * Only add owned blocks to the freelist. * More optimization/simplification. * Re-fixed the bug. * Revert unintentional changes to parser.rl. * Revert Lua changes for now. * Revert the arena fuse changes for now. * Added last_size to the arena representation. * Fixed compile errors. * Fixed compile error and changed benchmarks to do one allocation.	5 years ago

1 2 3 4 5

227 Commits (34ee951c19ec1a578fbfbd86c9b6cf5b7359dd1a)