protobuf

Commit Graph

Author	SHA1	Message	Date
Matt Kulukundis	00a88b2426	clang-format	3 years ago
Matt Kulukundis	17f3a0d236	move to internal	3 years ago
Joshua Haberman	606308c639	Added back missing underscore.	3 years ago
Joshua Haberman	75b6291e40	Renamed upb_FieldType_* -> kUpb_FieldType_*	3 years ago
Joshua Haberman	499c2cc8b1	upb_extreg, upb_msg	3 years ago
Joshua Haberman	1c955f37ce	Mass API rename and clang-reformat (#485 ) * Wave 1: upb_fielddef. * upb_fielddef itself. * upb_oneofdef. * upb_msgdef. * ExtensionRange. * upb_enumdef * upb_enumvaldef * upb_filedef * upb_methoddef * upb_servicedef * upb_symtab * upb_defpool_init * upb_wellknown and upb_syntax_t * Some constants. * upb_status * upb_strview * upb_arena * upb.h constants * reflection * encode * JSON decode. * json encode. * msg_internal. * Formatted with clang-format. * Some naming fixups and comment reformatting. * More refinements. * A few more stragglers. * Fixed PyObject_HEAD with semicolon. Removed TODO entries.	3 years ago
Joshua Haberman	2df18f0a3e	Addressed PR comments.	3 years ago
Joshua Haberman	39365f16a7	Addressed PR comments.	3 years ago
Joshua Haberman	a0374b3b08	Added required field checking into the encoder.	3 years ago
Joshua Haberman	7c83eb93be	Removed extra size from message.	3 years ago
Joshua Haberman	3d437bbcab	Some pre-PR fixes.	3 years ago
Joshua Haberman	c755099a89	WIP.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	3f8aa6ef20	Define the extension representation in messages and mini-tables. Nothing reads or writes this data yet, but we do implement the memory management that allows both unknown field data and extensions to grow within the same pseudo-arena in a message. By making both arrays grow towards each other, we avoid the need to reallocate them separately.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	6394894b6e	Addressed PR comments.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	3881393907	Renamed .int.h to _internal.h, for greater clarity.	4 years ago
Joshua Haberman	1674f28dd7	Put public message interface into msg.h and moved internal functions to msg.int.h.	4 years ago
Joshua Haberman	c358829c76	Now that handlers are gone, cleaned up table to use arenas exclusively. Also cleaned up some cruft from table.	4 years ago
Joshua Haberman	a04627abc8	Added map sorting to binary and text encoders. For the binary encoder, sorting is off by default. For the text encoder, sorting is on by default. Both defaults can be explicitly overridden. This grows code size a bit. I think we could potentially shave this (and other map-related code size) by having the generated code inject a function pointer to the map-related parsing/serialization code if maps are present. FILE SIZE VM SIZE -------------- -------------- +86% +1.07Ki +71% +768 upb/msg.c [NEW] +391 [NEW] +344 _upb_mapsorter_pushmap [NEW] +158 [NEW] +112 _upb_mapsorter_cmpstr [NEW] +111 [NEW] +64 _upb_mapsorter_cmpbool [NEW] +110 [NEW] +64 _upb_mapsorter_cmpi32 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpi64 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpu32 [NEW] +110 [NEW] +64 _upb_mapsorter_cmpu64 -3.6% -8 -4.3% -8 _upb_map_new +9.5% +464 +9.2% +424 upb/text_encode.c [NEW] +656 [NEW] +616 txtenc_mapentry +15% +32 +20% +32 upb_text_encode -20.1% -224 -20.7% -224 txtenc_msg +5.7% +342 +5.3% +296 upb/encode.c [NEW] +344 [NEW] +304 encode_mapentry [NEW] +246 [NEW] +208 upb_encode_ex [NEW] +41 [NEW] +16 upb_encode_ex.ch +0.7% +8 +0.7% +8 encode_scalar -1.0% -32 -1.0% -32 encode_message [DEL] -38 [DEL] -16 upb_encode.ch [DEL] -227 [DEL] -192 upb_encode +2.0% +152 +2.2% +152 upb/decode.c +44% +128 +44% +128 [section .rodata] +3.4% +24 +3.4% +24 _GLOBAL_OFFSET_TABLE_ +0.6% +107 +0.3% +48 upb/def.c [NEW] +100 [NEW] +48 upb_fielddef_descriptortype +7.1% +7 [ = ] 0 upb_fielddef_defaultint32 +2.9% +24 +2.9% +24 [section .dynsym] +1.2% +24 [ = ] 0 [section .symtab] +3.2% +16 +3.2% +16 [section .plt] [NEW] +16 [NEW] +16 memcmp@plt +0.5% +16 +0.6% +16 tests/conformance_upb.c +1.5% +16 +1.6% +16 DoTestIo +0.1% +16 +0.1% +16 upb/json_decode.c +0.4% +16 +0.4% +16 jsondec_wellknown +3.0% +8 +3.0% +8 [section .got.plt] +3.0% +8 +3.0% +8 _GLOBAL_OFFSET_TABLE_ +1.6% +7 +1.6% +7 [section .dynstr] +1.8% +4 +1.8% +4 [section .hash] +0.5% +3 +0.5% +3 [LOAD #2 [RX]] +2.8% +2 +2.8% +2 [section .gnu.version] -60.0% -1.74Ki [ = ] 0 [Unmapped] +0.3% +496 +1.4% +1.74Ki TOTAL	4 years ago
Joshua Haberman	5b1f0d86a1	For Kokoro, only build/test -m32 on Linux. Also fixed a bunch of bugs found by gcc's -fanalyzer.	4 years ago
Joshua Haberman	0497f8deed	Fixed a critical bug on 32-bit builds, and added much more Kokoro testing. There was a bug in our arena code where we assumed that sizeof(upb_array) would be a multiple of 8. On i386 it was not, and this was causing memory corruption on 32-bit builds.	4 years ago
Joshua Haberman	b928696942	A few more fixes, and test fastdecode under Kokoro.	4 years ago
Joshua Haberman	55f3569cd2	A few minor fixes and more assertions.	4 years ago
Joshua Haberman	e3e797b680	Added fasttable support for oneofs.	4 years ago
Joshua Haberman	3238821315	Gave fast table entry a nicer name.	4 years ago
Joshua Haberman	c10b24ffb2	Simplified switch().	4 years ago
Joshua Haberman	ded2e657a7	Added compatibility with old generated code. Until everyone can regenerate their code, we need to provide compatible semantics with the old generated code. Also fixed a bug where enums were allocated 8 bytes instead of 4.	4 years ago
Joshua Haberman	75edd3e59c	Changed to use table pairs, seems to ever-so-slightly regress.	4 years ago
Joshua Haberman	bca7edac8c	Cleaned up table compression a bit.	4 years ago
Joshua Haberman	8ed6b2fe85	Stored mask in the table pointer.	4 years ago
Joshua Haberman	a6dc88556d	Tables are compressed, but perf goes down to 2.44GB/s.	4 years ago
Joshua Haberman	71749b7caf	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Joshua Haberman	9557b97acc	Implemented inline array allocation, and moved type->lg2 map to reflection.	4 years ago
Joshua Haberman	b58d2a0ee6	Shrink overhead of message representation.	4 years ago
Joshua Haberman	0bf063a2ca	Shrink overhead of message representation.	4 years ago
Joshua Haberman	7ec2c52346	Donate/steal from arena to accelerate decoding.	4 years ago
Joshua Haberman	438ecaeb5a	Give all field parsers a generic table entry.	4 years ago
Joshua Haberman	efefbffc80	Fixed binary encoding and decoding for big-endian machines.	4 years ago
Joshua Haberman	363e39c171	Fix for extra compiler warnings. (#290 )	5 years ago
Joshua Haberman	b717575cef	Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors. (#289 ) * Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors. * Disable -Wshorten-32-to-64 since Kokoro is missing Clang. * Fixed -Wextra warnings for gcc. * Reordered UPB_UNUSED() to come after declarations. * Added another -pedantic fix and log CC version. * Fix compile error and conditionally run use_bazel.sh. * Moved set -e after use_bazel.sh. * Fixed typo in conditional.	5 years ago
Joshua Haberman	634d37515c	Bugfix for oneofs and added line/col info to JSON.	5 years ago
Joshua Haberman	543a0ce8f2	Fixes for PHP. (#286 ) - A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders. # Schema/Defs Changes - Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum. - Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type. # Decoder - Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage. # JSON Decoder - Allowed well-known types at the top level to have their special processing. - Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`. - Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0. # JSON Encoder - Allowed well-known types at the top level to have their special processing. - Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests. # Message / Reflection - Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set. - Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`. - Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm. # Compiler - Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.	5 years ago
Joshua Haberman	16facab490	Created an amalgamation without handlers, and fixed some bugs. (#283 ) * Created amalgamation with upb_msg but no handlers. * Bugfix for upb_array_resize(). * Renamed "lite" amalgamation to "core", to avoid confusion. Traditionally "lite" has meant "without reflection", but here we mean it as "without handlers-based code." * Build fixes from CI tests. * Removed some more C++-style comments. * Fix for out-of-order statements.	5 years ago
Joshua Haberman	38a1045975	Added a has_foo() generated method for proto3 submessage fields. (#266 ) This is better than checking against NULL, because in the future unset fields will (probably) return a default instance instead of NULL.	5 years ago

22 Commits (85072ce04ef8929d2dd88abc017c4ca3f3ba3647)