protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	7a24340a26	Fixed some more tests.	3 years ago
Joshua Haberman	fc725be5bc	Implemented proper unescaping of bytes defaults.	3 years ago
Joshua Haberman	aee30144cc	Fixed a couple bugs.	3 years ago
Joshua Haberman	54b605026d	Fixed a bug in ListFields().	3 years ago
Joshua Haberman	d2283ed219	Verify extension ranges, and addressed PR comments.	3 years ago
Joshua Haberman	df77ca5dbb	Check extension field numbers against extension ranges. This makes extension checking more strict in most cases. However it also fixes a bug with MessageSet where we were being too strict. MessageSet allows larger extension numbers than normal extensions do.	3 years ago
Joshua Haberman	7576a3bfc1	Avoid NULL + 0 when adding a list of 0 extensions.	3 years ago
Joshua Haberman	1845997498	Added comments.	3 years ago
Stan Hu	53250c8504	Fix encoding/decoding for def-to-proto on big-endian systems In a big-endian system, the 64-bit value of 1 is represented as: ``` 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1 ``` However, when `d.int32_val` is used, this truncates this and takes the first four bytes: ``` 0x0 0x0 0x0 0x0 ``` As a result, we lose the value of 1 from this truncation and the value beocmes 0. This doesn't happen in a little-endian system because the 1 is in the lowest memory address, so truncating the value to 32 bits doesn't change anything. Previously the DefToProto test was failing on a big-endian system because this truncation caused the key to be incorrectly set to 0. We now use the type-specific functions (e.g. `upb_fielddef_defaultint32`) to do this conversion. Closes https://github.com/protocolbuffers/upb/issues/442	3 years ago
Joshua Haberman	d0795a29d9	Test for def_to_proto is working.	3 years ago
Joshua Haberman	f7980b7ed1	Restructured for simplicity and fixed fasttable parser.	3 years ago
Joshua Haberman	3d437bbcab	Some pre-PR fixes.	3 years ago
Joshua Haberman	7771a0515b	Addressed PR comments.	3 years ago
Joshua Haberman	16f763e4d6	Addressed PR comments.	3 years ago
Joshua Haberman	9d26c706e0	Removed dependency on popcount() intrinsic.	3 years ago
Joshua Haberman	7907ed913b	Expanded the test to cover packed fields also.	3 years ago
Joshua Haberman	401e1747b5	Addressed PR feedback.	3 years ago
Joshua Haberman	cc03669a17	Several changes to defs. Biggest/key changes: 1. Defs are now nested per the .proto file syntax. 2. Options are parsed and vended.	3 years ago
Joshua Haberman	2484d12c1c	Addressed PR comments.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	3366d02f04	Addressed PR comments.	3 years ago
Joshua Haberman	5c28ab6b2c	Implemented upb_enumvaldef, for storing information about enumvals.	3 years ago
Joshua Haberman	53fba823de	Added missing upb_symtab_lookupext() function.	3 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	807e7fe9e2	Fixed dense_below logic to be order-independent and consistent between def.c and codegen.	4 years ago
Joshua Haberman	2e8a122fc0	Changed dense_below calculation to use UINT8_MAX as the constant.	4 years ago
Joshua Haberman	6394894b6e	Addressed PR comments.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	9482957425	Enforce that filenames are unique when loaded into symtab. This brings upb into line with C++. PHP already checks this internally, so this should not be an issue there. Ruby on the other hand does not currently check this, so this change will cause our Ruby implementation to reject some programs that would otherwise have been accepted.	4 years ago
Joshua Haberman	823eb09694	Update all 2011 dates to 2021.	4 years ago
Joshua Haberman	e59d2c8fa7	Added license headers to all files.	4 years ago
Joshua Haberman	83c0edbd2a	A few minor cleanups.	4 years ago
Joshua Haberman	c358829c76	Now that handlers are gone, cleaned up table to use arenas exclusively. Also cleaned up some cruft from table.	4 years ago
Joshua Haberman	ec9ba3f893	Fixed error message buffer overflow.	4 years ago
Joshua Haberman	f41c0ec261	Added an internal API to get arena from symtab, for Ruby's use.	4 years ago
Joshua Haberman	f5d2d55007	Deleted the legacy "Handlers" APIs. upb can finally be deserving of its name. This is possible now that all users have been migrated to the new upb_msg APIs.	4 years ago
Joshua Haberman	c7787cbaa1	Fixed a bunch of Clang warnings. Unfortunately a few of the Clang warnings did not have easy fixes: ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’: ../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn] 353 \| const char fastdecode_err(upb_decstate d) { \| ^~~~~~~~~~~~~~ ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’: ../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered] 867 \| bool _upb_decode(const char buf, size_t size, void msg, I even tried to suppress the first error, but it still shows up.	4 years ago
Joshua Haberman	5e550e88f8	Added API for getting fielddef default as a upb_msgval.	4 years ago
Joshua Haberman	ee49a8d7df	Added an accessor to get the symtab from a filedef. This matches an API already present in proto2 (const DescriptorPool* FileDescriptor::pool()). However there is a slightly subtle implication here. In proto2, the relationship between Descriptor and MessageFactory is 1:many. You can create as many DynamicMessageFactory instances as you want, and each one will have its own independent DynamicMessage prototype and computed layout for the same underlying Descriptor. In practice the layouts will all be the same, but one thing that could be distinct is that each can have its own extension pool, which is a DescriptorPool that will be searched for extensions when parsing. In contrast, upb does not have a separate "message factory" abstraction. That means that each upb_msgdef has a single distinct layout, in other words a 1:1 correspondence between descriptor and layout. This means that there is no way to create multiple message types for the same descriptor that have distinct extension pools. If you want a different set of extensions, you must create a separate upb_symtab with a distinct set of descriptors. This change further entrenches that upb_filedef:upb_symtab is a 1:1 relationship. A single upb_filedef cannot be a member of multiple symbol tables. In practice this was already true (there is no way to add a single filedef to multiple symbol tables) but this change codifies this 1:1 relationship.	4 years ago
Joshua Haberman	bc200451ce	Use a macro instead of an inline function for setjmp/longjmp.	4 years ago
Joshua Haberman	fbc0639b07	Use _setjmp on mac to avoid saving/restoring the signal mask.	4 years ago
Joshua Haberman	65d166a6ba	Added API for copy vs. alias and added benchmarks to test both. Benchmark output: $ bazel-bin/benchmarks/benchmark '--benchmark_filter=BM_Parse' 2020-11-11 15:39:04 Running bazel-bin/benchmarks/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_Parse_Upb_FileDesc<UseArena, Copy> 4134 ns 4134 ns 168714 1.69152GB/s BM_Parse_Upb_FileDesc<UseArena, Alias> 3487 ns 3487 ns 199509 2.00526GB/s BM_Parse_Upb_FileDesc<InitBlock, Copy> 3727 ns 3726 ns 187581 1.87643GB/s BM_Parse_Upb_FileDesc<InitBlock, Alias> 3110 ns 3110 ns 224970 2.24866GB/s BM_Parse_Proto2<FileDesc, NoArena, Copy> 31132 ns 31132 ns 22437 229.995MB/s BM_Parse_Proto2<FileDesc, UseArena, Copy> 21011 ns 21009 ns 33922 340.812MB/s BM_Parse_Proto2<FileDesc, InitBlock, Copy> 17976 ns 17975 ns 38808 398.337MB/s BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 17357 ns 17356 ns 40244 412.539MB/s	4 years ago
Joshua Haberman	5b1f0d86a1	For Kokoro, only build/test -m32 on Linux. Also fixed a bunch of bugs found by gcc's -fanalyzer.	4 years ago
Joshua Haberman	c9f9668234	symtab: use longjmp() for errors and avoid intermediate table. We used to use a separate "add table" during the upb_symtab_addfile() operation to make it easier to back out the file if it contained errors. But this created unnecessary work of re-adding the same symbols to the main symtab once everything was validated. Instead we directly add symbols to the main symbols table. If there is an error in validation, we remove precisely the set of symbols that were already added. This also requires using a separate arena for each file. We can fuse it with the symtab's main arena if the operation is successful. LoadDescriptor_Upb 61.2µs ± 4% 53.5µs ± 1% -12.50% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 4.43ms ± 1% 3.06ms ± 0% -31.00% (p=0.000 n=12+12) LoadDescriptor_Proto2 257µs ± 0% 259µs ± 0% +1.00% (p=0.000 n=12+12) LoadAdsDescriptor_Proto2 13.9ms ± 1% 13.9ms ± 1% ~ (p=0.128 n=12+12)	4 years ago
Joshua Haberman	c3b5637646	Added benchmark for loading ads descriptor. Generally this seems to track the speed of loading descriptor.proto. ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------- BM_LoadDescriptor_Upb 59091 ns 59086 ns 11747 121.182MB/s BM_LoadAdsDescriptor_Upb 4218587 ns 4218582 ns 166 120.544MB/s BM_LoadDescriptor_Proto2 241083 ns 241049 ns 2903 29.7043MB/s BM_LoadAdsDescriptor_Proto2 13442631 ns 13442099 ns 52 34.8975MB/s	4 years ago
Joshua Haberman	acd72c6d3f	WIP.	4 years ago

1 2 3 4

196 Commits (6c2eeb1f41dbcb7891e9c1f58fe0761a57bb1841)