protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	d22d6d71ed	Refactored message accessors to share a common set of functions instead of duplicating logic. Prior to this CL, there were several different code paths for reading/writing message data. Generated code, MiniTable accessors, and reflection all performed direct manipulation of the bits and bytes in a message, but they all had distinct implementations that did not share much of any code. This divergence meant that they could easily have different behavior, bugs could creep into one but not another, and we would need three different sets of tests to get full test coverage. This also made it very difficult to change the internal representation in any way, since it would require updating many places in the code. With this CL, the three different APIs for accessing message data now all share a common set of functions. The common functions all take a `upb_MiniTableField` as the canonical description of a field's type and layout. The lowest-level functions are very branchy, as they must test for every possible variation in the field type (field vs oneof, hasbit vs no-hasbit, different field sizes, whether a nonzero default value exists, extension vs. regular field), however these functions are declared inline and designed to be very optimizable when values are known at compile time. In generated accessors, for example, we can declare constant `upb_MiniTableField` instances so that all values can constant-propagate, and we can get fully specialized code even though we are calling a generic function. On the other hand, when we use the generic functions from reflection, we get runtime branches since values are not known at compile time. But even the function is written to still be as efficient as possible even when used from reflection. For example, we use memcpy() calls with constant length so that the compiler can optimize these into inline loads/stores without having to make an out-of-line call to memcpy(). In this way, this CL should be a benefit to both correctness and performance. It will also make it easier to change the message representation, for example to optimize the encoder by giving hasbits to all fields. Note that we have not completely consolidated all access in this CL: 1. Some functions outside of get/set such as clear and hazzers are not yet unified. 2. The encoder and decoder still touch the message without going through the common functions. The encoder and decoder require a bit more specialized code to get good performance when reading/writing fields en masse. PiperOrigin-RevId: 490016095	2 years ago
Eric Salo	27d70edfe2	clean up the :mini_table build target Remove circular dependencies that were bouncing back and forth between msg_internal.h and mini_table/, including: - splitting out each mini table subtype into its own header - moving the non-reflection message code into message/ - moving the accessors from mini_table/ to message/ PiperOrigin-RevId: 489121042	2 years ago
Eric Salo	75907f7af9	rename the upb_MiniTable subtypes to follow the upb style guide: upb_MiniTable_Enum -> upb_MiniTableEnum upb_MiniTable_Extension -> upb_MiniTableExtension upb_MiniTable_Field -> upb_MiniTableField upb_MiniTable_File -> upb_MiniTableFile upb_MiniTable_Sub -> upb_MiniTableSub PiperOrigin-RevId: 486712960	2 years ago
Eric Salo	f6307877d3	move portability stuff into upb/port/ Also delete redundant system #includes that are already pulled in by port/def.inc PiperOrigin-RevId: 486398989	2 years ago
Eric Salo	c033eff26f	split apart mini_table.c into a new subdir PiperOrigin-RevId: 484352293	2 years ago
Eric Salo	5c646803ef	implement mini descriptors for message sets PiperOrigin-RevId: 484058392	2 years ago
Eric Salo	20310e2f3a	implement mini descriptors for maps PiperOrigin-RevId: 483474044	2 years ago
Eric Salo	36ce2fa7d1	add version/tag chars to the start of all mini descriptors Verified during decoding. The specific values are just placeholders for now. PiperOrigin-RevId: 481009599	2 years ago
Eric Salo	40998462d6	add upb_MtDataEncoder_EncodeExtension() We need to sharpen the distinction between messages and extensions in the mini descriptor encoder, so split the code paths for each. PiperOrigin-RevId: 480675339	2 years ago
Joshua Haberman	b33fd88ed3	Added function for getting the type of a MiniTable field Prior to this CL, users were relying on `field->descriptortype` to get the field type. This almost works, as `field->descriptortype` is almost, but not quite, the field type of the field. In two special cases we deviate from the true field type, for ease of parsing and serialization: - For open enums, we use `kUpb_FieldType_Int32` instead of `kUpb_FieldType_Enum`, because from the perspective of the wire format, an open enum field is equivalent to int32. - For proto2 strings, we use `kUpb_FieldType_Bytes` instead of `kUpb_FieldType_String`, because proto2 strings do not perform UTF-8 validation, which makes them equivalent to bytes. In this CL we add a public API function: ``` // Returns the true field type for this field. upb_FieldType upb_MiniTableField_Type(const upb_MiniTable_Field* f); ``` This will provide the actual field type for this field. Note that this CL changes the MiniDescriptor format. Previously MiniDescriptors did not contain enough information to distinguish between Enum/Int32. To remedy this we added a new encoded field type, `kUpb_EncodedType_ClosedEnum`. PiperOrigin-RevId: 479387672	2 years ago
Joshua Haberman	d5bd55cde1	Treat unlinked sub-messages in the MiniTable as unknown This is an observable behavior change in the decoder. After submitting this CL, clients of the decoder can assume that any unlinked sub-messages will be treated as unknown, rather than crashing. Unlinked sub-messages must never have values present in the message. We can verify this with asserts. Since the values are never set, the encoder should never encounter data for any unlinked sub-message. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.3ns ± 9% 17.9ns ± 2% ~ (p=0.690 n=5+5) BM_ArenaInitialBlockOneAlloc 6.40ns ± 1% 6.68ns ±10% ~ (p=0.730 n=4+5) BM_LoadAdsDescriptor_Upb<NoLayout> 5.09ms ± 2% 5.03ms ± 3% ~ (p=0.222 n=5+5) BM_LoadAdsDescriptor_Upb<WithLayout> 5.45ms ± 3% 5.43ms ± 1% ~ (p=0.905 n=5+4) BM_LoadAdsDescriptor_Proto2<NoLayout> 10.9ms ± 1% 10.8ms ± 1% -1.09% (p=0.016 n=5+4) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.3ms ± 9% 11.1ms ± 3% ~ (p=0.841 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Copy> 11.2µs ± 3% 11.3µs ± 3% ~ (p=0.222 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Alias> 10.3µs ± 5% 10.5µs ± 5% ~ (p=0.310 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Copy> 11.4µs ±18% 11.0µs ± 2% ~ (p=1.000 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.5µs ±17% 10.6µs ±19% ~ (p=0.421 n=5+5) BM_Parse_Proto2<FileDesc, NoArena, Copy> 20.5µs ± 2% 20.2µs ± 2% ~ (p=0.222 n=5+5) BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.8µs ± 2% 10.9µs ± 4% ~ (p=0.841 n=5+5) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 10.5µs ± 3% 10.6µs ± 3% ~ (p=0.690 n=5+5) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 9.22µs ± 2% 9.23µs ± 3% ~ (p=1.000 n=5+5) BM_SerializeDescriptor_Proto2 6.05µs ± 3% 5.90µs ± 3% ~ (p=0.222 n=5+5) BM_SerializeDescriptor_Upb 10.2µs ± 3% 10.6µs ±14% ~ (p=0.841 n=5+5) name old time/op new time/op delta BM_ArenaOneAlloc 18.3ns ± 9% 17.9ns ± 2% ~ (p=0.841 n=5+5) BM_ArenaInitialBlockOneAlloc 6.42ns ± 1% 6.69ns ±10% ~ (p=0.730 n=4+5) BM_LoadAdsDescriptor_Upb<NoLayout> 5.10ms ± 2% 5.05ms ± 3% ~ (p=0.222 n=5+5) BM_LoadAdsDescriptor_Upb<WithLayout> 5.47ms ± 3% 5.45ms ± 1% ~ (p=0.905 n=5+4) BM_LoadAdsDescriptor_Proto2<NoLayout> 10.9ms ± 1% 10.8ms ± 1% -1.11% (p=0.016 n=5+4) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.4ms ± 9% 11.1ms ± 3% ~ (p=0.841 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Copy> 11.2µs ± 3% 11.3µs ± 3% ~ (p=0.222 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Alias> 10.3µs ± 5% 10.5µs ± 5% ~ (p=0.151 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Copy> 11.5µs ±18% 11.0µs ± 2% ~ (p=1.000 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.5µs ±17% 10.7µs ±19% ~ (p=0.421 n=5+5) BM_Parse_Proto2<FileDesc, NoArena, Copy> 20.6µs ± 2% 20.3µs ± 2% ~ (p=0.222 n=5+5) BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.9µs ± 2% 10.9µs ± 4% ~ (p=0.841 n=5+5) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 10.6µs ± 3% 10.6µs ± 3% ~ (p=0.690 n=5+5) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 9.24µs ± 2% 9.25µs ± 3% ~ (p=1.000 n=5+5) BM_SerializeDescriptor_Proto2 6.07µs ± 3% 5.91µs ± 3% ~ (p=0.222 n=5+5) BM_SerializeDescriptor_Upb 10.3µs ± 3% 10.6µs ±14% ~ (p=0.841 n=5+5) name old INSTRUCTIONS/op new INSTRUCTIONS/op delta BM_ArenaOneAlloc 201 ± 0% 201 ± 0% ~ (p=0.841 n=5+5) BM_ArenaInitialBlockOneAlloc 69.0 ± 0% 69.0 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 33.9M ± 0% 34.1M ± 0% +0.66% (p=0.008 n=5+5) BM_LoadAdsDescriptor_Upb<WithLayout> 35.6M ± 0% 35.8M ± 0% +0.64% (p=0.008 n=5+5) BM_LoadAdsDescriptor_Proto2<NoLayout> 70.8M ± 0% 70.8M ± 0% ~ (p=0.548 n=5+5) BM_LoadAdsDescriptor_Proto2<WithLayout> 71.6M ± 0% 71.6M ± 0% ~ (p=0.151 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Copy> 137k ± 0% 141k ± 0% +2.87% (p=0.008 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Alias> 125k ± 0% 128k ± 0% +2.83% (p=0.008 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Copy> 135k ± 0% 139k ± 0% +2.89% (p=0.008 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Alias> 124k ± 0% 127k ± 0% +2.85% (p=0.016 n=5+4) BM_Parse_Proto2<FileDesc, NoArena, Copy> 201k ± 0% 201k ± 0% ~ (p=0.222 n=5+5) BM_Parse_Proto2<FileDesc, UseArena, Copy> 107k ± 0% 107k ± 0% ~ (p=1.000 n=5+5) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 105k ± 0% 105k ± 0% ~ (p=0.286 n=5+4) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 86.5k ± 0% 86.5k ± 0% ~ (p=0.222 n=5+5) BM_SerializeDescriptor_Proto2 60.3k ± 0% 60.3k ± 0% ~ (p=0.071 n=5+5) BM_SerializeDescriptor_Upb 111k ± 0% 111k ± 0% ~ (p=0.841 n=5+5) name old CYCLES/op new CYCLES/op delta BM_ArenaOneAlloc 60.0 ± 7% 58.8 ± 0% -2.15% (p=0.016 n=5+5) BM_ArenaInitialBlockOneAlloc 21.0 ± 0% 21.0 ± 0% ~ (p=1.000 n=5+5) BM_LoadAdsDescriptor_Upb<NoLayout> 16.9M ± 0% 16.9M ± 0% ~ (p=0.056 n=5+5) BM_LoadAdsDescriptor_Upb<WithLayout> 17.9M ± 1% 18.0M ± 1% ~ (p=0.095 n=5+5) BM_LoadAdsDescriptor_Proto2<NoLayout> 35.9M ± 1% 35.8M ± 1% ~ (p=0.421 n=5+5) BM_LoadAdsDescriptor_Proto2<WithLayout> 36.5M ± 0% 36.5M ± 0% ~ (p=0.841 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Copy> 37.2k ± 0% 37.3k ± 0% ~ (p=0.222 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Alias> 34.1k ± 0% 34.7k ± 0% +1.66% (p=0.008 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Copy> 36.4k ± 0% 36.7k ± 0% +0.83% (p=0.008 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Alias> 33.3k ± 1% 34.1k ± 1% +2.39% (p=0.008 n=5+5) BM_Parse_Proto2<FileDesc, NoArena, Copy> 68.1k ± 1% 68.0k ± 1% ~ (p=0.421 n=5+5) BM_Parse_Proto2<FileDesc, UseArena, Copy> 36.0k ± 1% 36.1k ± 1% ~ (p=0.841 n=5+5) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 35.3k ± 1% 35.5k ± 1% ~ (p=0.151 n=5+5) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 30.7k ± 0% 30.9k ± 1% ~ (p=0.151 n=5+5) BM_SerializeDescriptor_Proto2 20.3k ± 2% 19.7k ± 3% ~ (p=0.151 n=5+5) BM_SerializeDescriptor_Upb 33.6k ± 0% 33.7k ± 2% ~ (p=1.000 n=5+5) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal) BM_ArenaInitialBlockOneAlloc 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.00k ± 0% 6.00k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 5.99k ± 0% 5.99k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 77.8k ± 0% 77.8k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 79.0k ± 0% 79.0k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.19 ± 0% 7.19 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.19 ± 0% 7.19 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<InitBlock, Copy> 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<InitBlock, Alias> 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.2 ± 0% 10.2 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal) BM_SerializeDescriptor_Proto2 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal) BM_SerializeDescriptor_Upb 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 344 ± 0% 344 ± 0% ~ (all samples are equal) BM_ArenaInitialBlockOneAlloc 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.64M ± 0% 9.64M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 9.70M ± 0% 9.70M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.38M ± 0% 6.38M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.44M ± 0% 6.44M ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<InitBlock, Copy> 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<InitBlock, Alias> 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 40.8k ± 0% 40.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_SerializeDescriptor_Proto2 112 ± 0% 112 ± 0% ~ (all samples are equal) BM_SerializeDescriptor_Upb 112 ± 0% 112 ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 147MB/s ± 2% 148MB/s ± 3% ~ (p=0.222 n=5+5) BM_LoadAdsDescriptor_Upb<WithLayout> 137MB/s ± 3% 137MB/s ± 1% ~ (p=0.905 n=5+4) BM_LoadAdsDescriptor_Proto2<NoLayout> 68.6MB/s ± 1% 69.3MB/s ± 1% +1.10% (p=0.016 n=5+4) BM_LoadAdsDescriptor_Proto2<WithLayout> 66.0MB/s ± 9% 67.4MB/s ± 3% ~ (p=0.841 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Copy> 675MB/s ± 3% 667MB/s ± 3% ~ (p=0.222 n=5+5) BM_Parse_Upb_FileDesc<UseArena, Alias> 730MB/s ± 5% 718MB/s ± 5% ~ (p=0.310 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Copy> 663MB/s ±16% 685MB/s ± 2% ~ (p=1.000 n=5+5) BM_Parse_Upb_FileDesc<InitBlock, Alias> 723MB/s ±15% 712MB/s ±16% ~ (p=0.421 n=5+5) BM_Parse_Proto2<FileDesc, NoArena, Copy> 367MB/s ± 2% 372MB/s ± 2% ~ (p=0.222 n=5+5) BM_Parse_Proto2<FileDesc, UseArena, Copy> 694MB/s ± 2% 691MB/s ± 4% ~ (p=0.841 n=5+5) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 714MB/s ± 3% 709MB/s ± 3% ~ (p=0.690 n=5+5) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 816MB/s ± 2% 816MB/s ± 3% ~ (p=1.000 n=5+5) BM_SerializeDescriptor_Proto2 1.24GB/s ± 3% 1.28GB/s ± 3% ~ (p=0.222 n=5+5) BM_SerializeDescriptor_Upb 734MB/s ± 3% 713MB/s ±13% ~ (p=0.841 n=5+5) ``` PiperOrigin-RevId: 477770562	2 years ago
Joshua Haberman	896e74c141	Optimizes `upb_MiniTable_Enum` for large but dense enums. Optimizes `upb_MiniTable_Enum` for enums with many values (>64) but with relatively dense packing in numeric space. This CL optimizes both the size and speed of such enums: - size: 30x code size reduction - speed: moved from linear search to a constant-time bit test Negative enum values are still expensive, as they are never put into the bitfield. PiperOrigin-RevId: 473259819	2 years ago
Joshua Haberman	5a7644b2d0	Fixed fuzz bug in upb. Extending a MessageSet with a non-message extension was causing crashes that would manifest in various ways. PiperOrigin-RevId: 472496259	2 years ago
Protobuf Team Bot	0c6531378d	Merge GetEnum into GetInt32. Rename SetEnum to SetEnumProto2 to be clear that upb only treats Proto2 enum as enum. Proto3 enums should use SetInt32. PiperOrigin-RevId: 467000685	2 years ago
Protobuf Team Bot	e09d6fcb6d	Update mini table API comment PiperOrigin-RevId: 463868386	2 years ago
Joshua Haberman	125db89ff5	Added fuzz tests for mini table building and binary format parsing/serialization. PiperOrigin-RevId: 458240180	2 years ago
Protobuf Team Bot	49876f4633	Update sample of using upb_MtDataEncoder PiperOrigin-RevId: 457501667	2 years ago
Protobuf Team	c7620a4690	Mini table accessors Part2 (repeated fields). Introduces upb_FieldValue for array accessor api. PiperOrigin-RevId: 445491436	3 years ago
Joshua Haberman	fa8b605f78	Implemented MiniDescriptors for proto2 enums. An enum MiniDescriptor simply encodes a set of valid `int32_t` values, so that the protobuf parser can test whether a given enum value is known or not. The format implemented here is novel and needs to be documented. In short, the format is: 1. base92 values 0-31: 5-bit mask indicating presence or absence of the next five enum values. 2. base92 values 60-91: varint indicating skip over a region of enum values. Negative enum values are encoded as their `uint32_t` equivalent. PiperOrigin-RevId: 442892799	3 years ago
Joshua Haberman	911a25e738	Passes nearly all tests!	3 years ago
Joshua Haberman	e0aaad386f	Passes all conformance tests!	3 years ago
Joshua Haberman	76a81e2177	WIP.	3 years ago
Joshua Haberman	970c645140	Fixes for google3 (layering check and formatting).	3 years ago
Joshua Haberman	8405436044	Addressed PR comments.	3 years ago
Joshua Haberman	f5246b70fd	clang-format	3 years ago
Joshua Haberman	de2c129362	First draft of mini-table building API.	3 years ago
Joshua Haberman	7647b79403	WIP.	3 years ago
Joshua Haberman	32038c1c59	New API for building extensions appears to work.	3 years ago
Joshua Haberman	13434560e0	WIP.	3 years ago
Joshua Haberman	eac23bc389	WIP.	3 years ago
Joshua Haberman	fb3b783353	WIP.	3 years ago
Joshua Haberman	5dfbc684dd	WIP.	3 years ago

5 Commits (2272970a94874b5efb967bc44588868fe1881620)