protobuf

Commit Graph

Author	SHA1	Message	Date
Protobuf Team Bot	ce32d9d68f	Fix code generation for infinity default value on float/double fields. PiperOrigin-RevId: 469843901	2 years ago
Protobuf Team Bot	8c44f04697	create and lock down upb/internal/array.h Internal array functions are now implemented in upb/internal/array.c and declared in upb/internal/array.h, which only has local visibility. PiperOrigin-RevId: 458260144	2 years ago
Protobuf Team Bot	0c78048723	clean the fences for the headers: some headers were not including port_def.inc some headers were not declaring extern "C" some headers were backing out of the above in the wrong order PiperOrigin-RevId: 457391878	2 years ago
Protobuf Team Bot	7975945e61	clean up the dependency graph some more PiperOrigin-RevId: 456890270	2 years ago
Protobuf Team Bot	1695cb2788	rename the upb_Array 'len' field as 'size' Now that 'size' has been renamed as 'capacity' we are free to rename 'len' as 'size', so upb_Array_Size() is actually returning the 'size' field. PiperOrigin-RevId: 456865972	2 years ago
Protobuf Team Bot	83f4988561	rename the upb_Array 'size' field as 'capacity' The current field/function names for upb_Array are quite confusing. We will fix them in two steps, this being the first step. PiperOrigin-RevId: 456687224	2 years ago
Protobuf Team Bot	e4635f223e	match file names to type names Lots of changes but it's all just moving things around. Backward-compatible stub #include's have been provided for now. upb_Arena/upb_Status have been split out from upb/upb.? upb_Array/upb_Map/upb_MessageValue have been split out from upb/collections.? upb_ExtensionRegistry has been split out from upb/msg.? upb/decode_internal.h is now upb/internal/decode.h upb/mini_table_accessors_internal.h is now upb/internal/mini_table_accessors.h upb/table_internal.h is now upb/internal/table.h upb/upb_internal.h is now upb/internal/upb.h PiperOrigin-RevId: 456297617	2 years ago
Joshua Haberman	1cf8214e4d	Changed upb's arena alignment from 16 to 8. upb has traditionally returned 16-byte-aligned pointers from arena allocation. This was out of an abundance of caution, since users could theoretically be using upb arenas to allocate memory that is then used for SSE/AVX values (eg. [`__m128`](https://docs.microsoft.com/en-us/cpp/cpp/m128?view=msvc-170), which require 16-byte alignment. In practice, the protobuf C++ arena has used 8-byte alignment for 8 years with no significant problems I know of arising from SSE etc. Reducing the alignment requirement to 8 will save memory. It will also help with compatibility on 32-bit architectures where `malloc()` only returns 8-byte aligned memory. The immediate motivation is to fix the win32 build for Python protobuf. PiperOrigin-RevId: 448331777	3 years ago
Joshua Haberman	d72d98495d	Fixed a Python test by adding a new map insert function that distinguishes between insert and update. PiperOrigin-RevId: 447214618	3 years ago
Joshua Haberman	3e0890c055	Added support for UnknownFieldSet. PiperOrigin-RevId: 443143448	3 years ago
Joshua Haberman	76c7ca9327	Updated API for accessing extensions. PiperOrigin-RevId: 442947856	3 years ago
Joshua Haberman	fa8b605f78	Implemented MiniDescriptors for proto2 enums. An enum MiniDescriptor simply encodes a set of valid `int32_t` values, so that the protobuf parser can test whether a given enum value is known or not. The format implemented here is novel and needs to be documented. In short, the format is: 1. base92 values 0-31: 5-bit mask indicating presence or absence of the next five enum values. 2. base92 values 60-91: varint indicating skip over a region of enum values. Negative enum values are encoded as their `uint32_t` equivalent. PiperOrigin-RevId: 442892799	3 years ago
Protobuf Team	e1e7435e70	Internal change PiperOrigin-RevId: 440796832	3 years ago
Joshua Haberman	9cc02bb60d	Rewrote the MessageSet parsing code in the upb decoder to properly handle several edge cases. PiperOrigin-RevId: 440788402	3 years ago
Protobuf Team	bef53686ec	Add support for clear field in upbc. Add support for setting extension field value. PiperOrigin-RevId: 439365359	3 years ago
Joshua Haberman	cb55c4d781	Addressed PR comments.	3 years ago
Joshua Haberman	8d148f023e	Clang-format and fixed missing dep.	3 years ago
Joshua Haberman	20e7802fca	Clang-format.	3 years ago
Joshua Haberman	5b711f286b	WIP.	3 years ago
Joshua Haberman	76a81e2177	WIP.	3 years ago
Joshua Haberman	7c4d12e856	Addressed PR comments.	3 years ago
Joshua Haberman	8405436044	Addressed PR comments.	3 years ago
Joshua Haberman	8ede0d552d	Tests are passing.	3 years ago
Joshua Haberman	44363393f3	Backed out a functional refactoring.	3 years ago
Joshua Haberman	532dc1f0f0	Renamed a few more constants to the new style. These are not in the public API and so were not prioritized before. No functional change here, just renames.	3 years ago
Matt Kulukundis	00a88b2426	clang-format	3 years ago
Matt Kulukundis	17f3a0d236	move to internal	3 years ago
Joshua Haberman	13434560e0	WIP.	3 years ago
Joshua Haberman	5dfbc684dd	WIP.	3 years ago
Joshua Haberman	606308c639	Added back missing underscore.	3 years ago
Joshua Haberman	75b6291e40	Renamed upb_FieldType_* -> kUpb_FieldType_*	3 years ago
Joshua Haberman	499c2cc8b1	upb_extreg, upb_msg	3 years ago
Joshua Haberman	1c955f37ce	Mass API rename and clang-reformat (#485 ) * Wave 1: upb_fielddef. * upb_fielddef itself. * upb_oneofdef. * upb_msgdef. * ExtensionRange. * upb_enumdef * upb_enumvaldef * upb_filedef * upb_methoddef * upb_servicedef * upb_symtab * upb_defpool_init * upb_wellknown and upb_syntax_t * Some constants. * upb_status * upb_strview * upb_arena * upb.h constants * reflection * encode * JSON decode. * json encode. * msg_internal. * Formatted with clang-format. * Some naming fixups and comment reformatting. * More refinements. * A few more stragglers. * Fixed PyObject_HEAD with semicolon. Removed TODO entries.	3 years ago
Joshua Haberman	2df18f0a3e	Addressed PR comments.	3 years ago
Joshua Haberman	39365f16a7	Addressed PR comments.	3 years ago
Joshua Haberman	a0374b3b08	Added required field checking into the encoder.	3 years ago
Joshua Haberman	7c83eb93be	Removed extra size from message.	3 years ago
Joshua Haberman	3d437bbcab	Some pre-PR fixes.	3 years ago
Joshua Haberman	c755099a89	WIP.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	3f8aa6ef20	Define the extension representation in messages and mini-tables. Nothing reads or writes this data yet, but we do implement the memory management that allows both unknown field data and extensions to grow within the same pseudo-arena in a message. By making both arrays grow towards each other, we avoid the need to reallocate them separately.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	6394894b6e	Addressed PR comments.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	3881393907	Renamed .int.h to _internal.h, for greater clarity.	4 years ago
Joshua Haberman	1674f28dd7	Put public message interface into msg.h and moved internal functions to msg.int.h.	4 years ago

47 Commits (4e979b8d13efa461e6399e08c4ee70bee38961aa)