protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	bcb08bf9f0	Clang-format.	3 years ago
Joshua Haberman	6509f13568	Reverted extra debug assignment.	3 years ago
Joshua Haberman	1046d778a2	Removed debug print statements.	3 years ago
Joshua Haberman	7d5f4cd9b6	Implemented the functionality to make the test pass.	3 years ago
theodorerose	97273a3638	WIP	3 years ago
Joshua Haberman	532dc1f0f0	Renamed a few more constants to the new style. These are not in the public API and so were not prioritized before. No functional change here, just renames.	3 years ago
Joshua Haberman	826eca6742	Enabled ubsan tests and fixed ubsan failures.	3 years ago
Joshua Haberman	606308c639	Added back missing underscore.	3 years ago
Joshua Haberman	75b6291e40	Renamed upb_FieldType_* -> kUpb_FieldType_*	3 years ago
Joshua Haberman	72af9dc0cc	Switch to a single upb_Decode.	3 years ago
Joshua Haberman	499c2cc8b1	upb_extreg, upb_msg	3 years ago
Joshua Haberman	1c955f37ce	Mass API rename and clang-reformat (#485 ) * Wave 1: upb_fielddef. * upb_fielddef itself. * upb_oneofdef. * upb_msgdef. * ExtensionRange. * upb_enumdef * upb_enumvaldef * upb_filedef * upb_methoddef * upb_servicedef * upb_symtab * upb_defpool_init * upb_wellknown and upb_syntax_t * Some constants. * upb_status * upb_strview * upb_arena * upb.h constants * reflection * encode * JSON decode. * json encode. * msg_internal. * Formatted with clang-format. * Some naming fixups and comment reformatting. * More refinements. * A few more stragglers. * Fixed PyObject_HEAD with semicolon. Removed TODO entries.	3 years ago
Joshua Haberman	a0374b3b08	Added required field checking into the encoder.	3 years ago
Stan Hu	81eda8fade	Fix conformance test failures on big-endian systems Previously 59 tests were failing in the conformance tests. These were failing in SINT32 and JSON enum handling. In both cases, we need to cast int64 values to int32 to avoid losing bytes in a big-endian system. Closes https://github.com/protocolbuffers/upb/issues/449	3 years ago
Joshua Haberman	50978256b9	Properly byte-swap fixed packed fields.	3 years ago
Stan Hu	ba83e135d2	Refactor decode_munge to call decode_munge_int32 This avoids an additional type check when decoding packed enums.	3 years ago
Stan Hu	ad4d4076e1	Fix big endian decoding of enum_packed	3 years ago
Stan Hu	c604ed9ae9	Fix big endian handling of enums Previously using an enum for a field on big-endian process such as the s390x would fail to decode properly. We need to munge it in the correct byte order before decoding it.	3 years ago
Joshua Haberman	2199be91bc	Fixed UBSAN error.	3 years ago
Joshua Haberman	58968d6a78	A bit of minor code tweaking that improves benchmarks by 10%.	3 years ago
Joshua Haberman	58c1dbc11f	Addressed PR comments.	3 years ago
Joshua Haberman	1618e1b9a6	Fixed case of parsing an unknown field.	3 years ago
Joshua Haberman	f7980b7ed1	Restructured for simplicity and fixed fasttable parser.	3 years ago
Joshua Haberman	dfa28861cc	Don't store field_start, derive it separately.	3 years ago
Joshua Haberman	7c83eb93be	Removed extra size from message.	3 years ago
Joshua Haberman	3d437bbcab	Some pre-PR fixes.	3 years ago
Joshua Haberman	4abe724dde	A few more fixes.	3 years ago
Joshua Haberman	16f763e4d6	Addressed PR comments.	3 years ago
Joshua Haberman	7907ed913b	Expanded the test to cover packed fields also.	3 years ago
Joshua Haberman	8c916941b0	MSET -> MSGSET	3 years ago
Joshua Haberman	b3c91c276b	Addressed PR comments.	3 years ago
Joshua Haberman	d80e682a9c	Moved find field function closer to where it is used.	3 years ago
Joshua Haberman	53ce4354cf	Minor formatting changes.	3 years ago
Joshua Haberman	37a577c0e7	Further factored the function.	3 years ago
Joshua Haberman	69bb5d1d94	Simplified main parsing function.	3 years ago
Joshua Haberman	6f89034249	Implemented support for MessageSet.	3 years ago
Joshua Haberman	b1bbbdd4e7	Addressed PR comments.	3 years ago
Joshua Haberman	ce012b7b55	Added support for extensions.	3 years ago
Joshua Haberman	cdd6434a31	Introduced upb_extreg and plumbed it into decoder.	4 years ago
Joshua Haberman	58e158c6fa	Changed mini-table to use a custom "mode" instead of descriptor's "label."	4 years ago
Joshua Haberman	6394894b6e	Addressed PR comments.	4 years ago
Joshua Haberman	65d7b8ab0c	Optimized decoder and paved the way for parsing extensions. The primary motivation for this change is to avoid referring to the `upb_msglayout` object when we are trying to fetch the `upb_msglayout` object for a sub-message. This will help pave the way for parsing extensions. We also implement several optimizations so that we can make this change without regressing performance. Normally we compute the layout for a sub-message field like so: ``` const upb_msglayout get_submsg_layout( const upb_msglayout layout, const upb_msglayout_field field) { return layout->submsgs[field->submsg_index] } ``` The reason for this indirection is to avoid storing a pointer directly in `upb_msglayout_field`, as this would double its size (from 12 to 24 bytes on 64-bit architectures) which is wasteful as this pointer is only needed for message typed fields. However `get_submsg_layout` as written above does not work for extensions, as they will not have entries in the message's `layout->submsgs` array by nature, and we want to avoid creating an entire fake `upb_msglayout` for each such extension since that would also be wasteful. This change removes the dependency on `upb_msglayout` by passing down the `submsgs` array instead: ``` const upb_msglayout get_submsg_layout( const upb_msglayout const submsgs, const upb_msglayout_field *field) { return submsgs[field->submsg_index] } ``` This will pave the way for parsing extensions, as we can more easily create an alternative `submsgs` array for extension fields without extra overhead or waste. Along the way several optimizations presented themselves that allow a nice increase in performance: 1. Passing the parsed `wireval` by address instead of by value ended up avoiding an expensive and useless stack copy (this is on Clang, which was used for all measurements). 2. When field numbers are densely packed, we can find a field by number with a single indexed lookup instead of linear search. At codegen time we can compute the maximum field number that will allow such an indexed lookup. 3. For fields that do require linear search, we can start the linear search at the location where we found the previous field, taking advantage of the fact that field numbers are generally increasing. 4. When the hasbit index is less than 32 (the common case) we can use a less expensive code sequence to set it. 5. We check for the hasbit case before the oneof case, as optional fields are more common than oneof fields. Benchmark results indicate a 20% improvement in parse speed with a small code size increase: ``` name old time/op new time/op delta ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12) ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10) LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12) LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12) LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12) LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10) Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12) Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11) Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12) Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12) Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12) Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11) Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10) Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11) SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11) SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11) FILE SIZE VM SIZE -------------- -------------- +1.5% +176 +1.6% +176 upb/decode.c +1.8% +176 +1.9% +176 decode_msg +0.4% +64 +0.4% +64 upb/def.c +1.4% +64 +1.4% +64 _upb_symtab_addfile +1.2% +48 +1.4% +48 upb/reflection.c +15% +32 +18% +32 upb_msg_set +2.9% +16 +3.1% +16 upb_msg_mutable -9.3% -288 [ = ] 0 [Unmapped] [ = ] 0 +0.2% +288 TOTAL ```	4 years ago
Joshua Haberman	3881393907	Renamed .int.h to _internal.h, for greater clarity.	4 years ago
Joshua Haberman	823eb09694	Update all 2011 dates to 2021.	4 years ago
Joshua Haberman	e59d2c8fa7	Added license headers to all files.	4 years ago
Matt Kulukundis	d9a0c58108	Allow arena fuse to fail Track initial blocks to avoid having fuse operate on arenas that cannot be fused.	4 years ago
Joshua Haberman	c7787cbaa1	Fixed a bunch of Clang warnings. Unfortunately a few of the Clang warnings did not have easy fixes: ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’: ../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn] 353 \| const char fastdecode_err(upb_decstate d) { \| ^~~~~~~~~~~~~~ ../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’: ../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered] 867 \| bool _upb_decode(const char buf, size_t size, void msg, I even tried to suppress the first error, but it still shows up.	4 years ago
Joshua Haberman	9175989431	Bugfix for arena cleanup list when passing to upb_decode().	4 years ago
Joshua Haberman	8d670d8aea	Renamed decode_varint32() to decode_tag().	4 years ago
Joshua Haberman	9abf8e043f	Clamp 32-bit varints to 5 bytes to fix a fuzz failure.	4 years ago

1 2 3 4

170 Commits (e0aaad386fc41029133563e73a1a2a24d2ec7cf1)