Joshua Haberman
518e19dc63
Fixed fuzz bug by further validating map entries.
...
Map entry key/value cannot be repeated or map. Value cannot be group.
PiperOrigin-RevId: 505586184
2 years ago
Joshua Haberman
df850c821e
Fixed fuzz bug when map entry containing unknown fields spans buffers.
...
To correctly handle this case we must add the serialized map entry to the unknown fields. Ideally we could merely preserve the map entry's serialized bytes from the input. However this is tricky to do if we are streaming and the previous buffer where the map entry began is no longer available.
This CL fixes this edge case by using the encoder to re-encode the map entry rather than using the input bytes directly.
While this fix is reasonably simple and reliable, it has two unfortunate properties. One is performance: we now must run the encoder to recreate bytes that we already saw in the input.
The other is dependencies: this fix has the unfortunate property of making the decoder depend on the encoder. In applications that only want the decoder but not the encoder, this will increase binary size. But the practical effects of this are probably minimal (the vast majority of applications that depend on the decoder will also use the encoder).
We can revisit this later and see if there is a better way of preserving the input bytes without re-encoding. For now this fix is simple and correct and fixes the fuzz bug.
PiperOrigin-RevId: 505381927
2 years ago
Joshua Haberman
7ad307eea8
Added assert that message-type map values must be linked.
...
PiperOrigin-RevId: 505225954
2 years ago
Joshua Haberman
a5e337f88c
Allow dynamic tree shaking for groups, and disallow groups for map entry sub-messages.
...
The initial motivation for this CL was to fix a bug found by fuzzing. But the fuzz bug pointed out a few edge cases that this CL corrects:
1. The core bug is that we were allowing a map entry sub-message to be linked to a group field. This is not allowed in protobuf schemas, but we did not check for this edge case in `upb_MiniTable_SetSubMessage()`, so we were de facto allowing it. This triggered some bad behavior in the parser whereby we pushed a limit without checking its validity first.
2. To defend against this, I added asserts in `upb_MiniTable_SetSubMessage()` to verify the type of the field we are linking, to ensure that a group field is not linked to a map entry sub-message. But this should probably be changed to return an error instead of relying on asserts for this.
3. I changed the fuzz util code that builds the MiniTable so that it will never violate this new invariant. The fuzz util code now can run into situations where a group field has no valid non-map-entry sub-message to select. In those cases it will simply not register any sub-message for that field.
4. Previously group did not support leaving sub-messages unregistered. Previously I added this feature for sub-messages but not for groups. There is no reason why dynamic tree shaking should not work for group fields, so I extended the feature to support groups also.
PiperOrigin-RevId: 504913630
2 years ago
Protobuf Team Bot
2ad7a308e9
Avoid automatic variables in functions using setjmp.
...
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part VI: Wire encoder/decoder.
PiperOrigin-RevId: 504810940
2 years ago
Protobuf Team Bot
76daaa2319
Incremental step towards fixing nonstandard/nonportable uses of setjmp.
...
As https://en.cppreference.com/w/c/program/setjmp explains, setjmp can only be
used in very specific contexts to not trigger UB.
PiperOrigin-RevId: 499540912
2 years ago
Joshua Haberman
7cd8f6c940
Ported more cases of wire format parsing to upb_WireReader.
...
PiperOrigin-RevId: 498502557
2 years ago
Joshua Haberman
a48af3f824
Moved aliasing logic for string field parsing into EpsCopyInputStream.
...
Moving the logic down to EpsCopyInputStream makes it easier to test and reuse this functionality.
We also implement aliasing for the final bytes of the patch buffer, which has never been supported before. We used to always force a copy for any data parsed out of the patch buffer at the end of the stream.
Much of this logic is ported directly from the C++ EpsCopyInputStream class.
PiperOrigin-RevId: 498091644
2 years ago
Joshua Haberman
68d1d91475
Separated out buffering code into upb_EpsCopyInputStream.
...
This mirrors the structure of C++ protobuf, which has an EpsCopyInputStream class.
This will lay the foundation for making EpsCopyInputStream capable of true streaming, by reading its input from a ZeroCopyInputStream. It also lets us test EpsCopyInputStream separately from the decoder: see the new unit test that fuzzes upb_EpsCopyInputStream.
After this CL is submitted, the two decoders (the normal decoder and the fast decoder) should no longer be accessing the members of upb_EpsCopyInputStream.
PiperOrigin-RevId: 494400285
2 years ago
Joshua Haberman
9c6223c058
Unified all hazzers to use MiniTable accessors
...
This required some work to unify map entry messages with regular messages, with respect to presence. Before map entry fields could never have presence. Now they can have presence according to normal rules. Note that this only applies to times that the user constructs a map entry directly.
PiperOrigin-RevId: 490611656
2 years ago
Eric Salo
ff6439fba0
move the wire type definitions into upb/wire/ where they belong
...
PiperOrigin-RevId: 489500430
2 years ago
Eric Salo
27d70edfe2
clean up the :mini_table build target
...
Remove circular dependencies that were bouncing back and forth between
msg_internal.h and mini_table/, including:
- splitting out each mini table subtype into its own header
- moving the non-reflection message code into message/
- moving the accessors from mini_table/ to message/
PiperOrigin-RevId: 489121042
2 years ago
Eric Salo
ff8e1b40ba
create base/ subdir and expand :status build target to :base
...
upb.h is now just a temporary stub
PiperOrigin-RevId: 488255988
2 years ago
Eric Salo
75907f7af9
rename the upb_MiniTable subtypes to follow the upb style guide:
...
upb_MiniTable_Enum -> upb_MiniTableEnum
upb_MiniTable_Extension -> upb_MiniTableExtension
upb_MiniTable_Field -> upb_MiniTableField
upb_MiniTable_File -> upb_MiniTableFile
upb_MiniTable_Sub -> upb_MiniTableSub
PiperOrigin-RevId: 486712960
2 years ago
Eric Salo
f6307877d3
move portability stuff into upb/port/
...
Also delete redundant system #includes that are already pulled in by port/def.inc
PiperOrigin-RevId: 486398989
2 years ago
Eric Salo
46699b72ad
move message set enums into upb/wire/ (and use them)
...
PiperOrigin-RevId: 486366363
2 years ago
Eric Salo
fd040a8bff
create collections/map_internal.h and collections/map_gencode_util.h
...
Move the map-related functions from msg_internal.h that are only used in generated code into map_gencode_util.h. Then move the rest of the map-related functions from msg_internal.h into map_internal.h.
PiperOrigin-RevId: 486299140
2 years ago
Eric Salo
fd14316f38
create collections/ subdir for all array and map code
...
PiperOrigin-RevId: 486013554
2 years ago
Eric Salo
e137175092
create wire/ subdir for all wire-format code
...
PiperOrigin-RevId: 485898080
2 years ago
Eric Salo
df34b04658
make upb_ExtensionRegistry_AddArray() and upb_ExtensionRegistry_Lookup() public
...
Previously these were internal _upb functions but it turns out that they are also useful outside of upb.
PiperOrigin-RevId: 480997511
2 years ago
Eric Salo
41335a03be
normalize upb_Message_New()
...
We had _upb_Message_New(), which created a message from a mini table and was
being used outside of upb even though it is an internal-only function.
We also had upb_Message_New(), which created a message from a message def.
Now there is a single public function, upb_Message_New(), which creates a
message from a mini table and covers all use cases. (The internal version has the same definition and is used for inlining.)
PiperOrigin-RevId: 480169804
2 years ago
Protobuf Team Bot
50d36a62b0
Add asserts for unlinked mini tables.
...
In particular upb doesn't handle unlinked groups yet which requires checks to make sure callers did link group subs.
PiperOrigin-RevId: 479702475
2 years ago
Joshua Haberman
d5bd55cde1
Treat unlinked sub-messages in the MiniTable as unknown
...
This is an observable behavior change in the decoder. After submitting this CL, clients of the decoder can assume that any unlinked sub-messages will be treated as unknown, rather than crashing.
Unlinked sub-messages must never have values present in the message. We can verify this with asserts. Since the values are never set, the encoder should never encounter data for any unlinked sub-message.
```
name old cpu/op new cpu/op delta
BM_ArenaOneAlloc 18.3ns ± 9% 17.9ns ± 2% ~ (p=0.690 n=5+5)
BM_ArenaInitialBlockOneAlloc 6.40ns ± 1% 6.68ns ±10% ~ (p=0.730 n=4+5)
BM_LoadAdsDescriptor_Upb<NoLayout> 5.09ms ± 2% 5.03ms ± 3% ~ (p=0.222 n=5+5)
BM_LoadAdsDescriptor_Upb<WithLayout> 5.45ms ± 3% 5.43ms ± 1% ~ (p=0.905 n=5+4)
BM_LoadAdsDescriptor_Proto2<NoLayout> 10.9ms ± 1% 10.8ms ± 1% -1.09% (p=0.016 n=5+4)
BM_LoadAdsDescriptor_Proto2<WithLayout> 11.3ms ± 9% 11.1ms ± 3% ~ (p=0.841 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Copy> 11.2µs ± 3% 11.3µs ± 3% ~ (p=0.222 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Alias> 10.3µs ± 5% 10.5µs ± 5% ~ (p=0.310 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 11.4µs ±18% 11.0µs ± 2% ~ (p=1.000 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.5µs ±17% 10.6µs ±19% ~ (p=0.421 n=5+5)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 20.5µs ± 2% 20.2µs ± 2% ~ (p=0.222 n=5+5)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.8µs ± 2% 10.9µs ± 4% ~ (p=0.841 n=5+5)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 10.5µs ± 3% 10.6µs ± 3% ~ (p=0.690 n=5+5)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 9.22µs ± 2% 9.23µs ± 3% ~ (p=1.000 n=5+5)
BM_SerializeDescriptor_Proto2 6.05µs ± 3% 5.90µs ± 3% ~ (p=0.222 n=5+5)
BM_SerializeDescriptor_Upb 10.2µs ± 3% 10.6µs ±14% ~ (p=0.841 n=5+5)
name old time/op new time/op delta
BM_ArenaOneAlloc 18.3ns ± 9% 17.9ns ± 2% ~ (p=0.841 n=5+5)
BM_ArenaInitialBlockOneAlloc 6.42ns ± 1% 6.69ns ±10% ~ (p=0.730 n=4+5)
BM_LoadAdsDescriptor_Upb<NoLayout> 5.10ms ± 2% 5.05ms ± 3% ~ (p=0.222 n=5+5)
BM_LoadAdsDescriptor_Upb<WithLayout> 5.47ms ± 3% 5.45ms ± 1% ~ (p=0.905 n=5+4)
BM_LoadAdsDescriptor_Proto2<NoLayout> 10.9ms ± 1% 10.8ms ± 1% -1.11% (p=0.016 n=5+4)
BM_LoadAdsDescriptor_Proto2<WithLayout> 11.4ms ± 9% 11.1ms ± 3% ~ (p=0.841 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Copy> 11.2µs ± 3% 11.3µs ± 3% ~ (p=0.222 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Alias> 10.3µs ± 5% 10.5µs ± 5% ~ (p=0.151 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 11.5µs ±18% 11.0µs ± 2% ~ (p=1.000 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.5µs ±17% 10.7µs ±19% ~ (p=0.421 n=5+5)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 20.6µs ± 2% 20.3µs ± 2% ~ (p=0.222 n=5+5)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.9µs ± 2% 10.9µs ± 4% ~ (p=0.841 n=5+5)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 10.6µs ± 3% 10.6µs ± 3% ~ (p=0.690 n=5+5)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 9.24µs ± 2% 9.25µs ± 3% ~ (p=1.000 n=5+5)
BM_SerializeDescriptor_Proto2 6.07µs ± 3% 5.91µs ± 3% ~ (p=0.222 n=5+5)
BM_SerializeDescriptor_Upb 10.3µs ± 3% 10.6µs ±14% ~ (p=0.841 n=5+5)
name old INSTRUCTIONS/op new INSTRUCTIONS/op delta
BM_ArenaOneAlloc 201 ± 0% 201 ± 0% ~ (p=0.841 n=5+5)
BM_ArenaInitialBlockOneAlloc 69.0 ± 0% 69.0 ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout> 33.9M ± 0% 34.1M ± 0% +0.66% (p=0.008 n=5+5)
BM_LoadAdsDescriptor_Upb<WithLayout> 35.6M ± 0% 35.8M ± 0% +0.64% (p=0.008 n=5+5)
BM_LoadAdsDescriptor_Proto2<NoLayout> 70.8M ± 0% 70.8M ± 0% ~ (p=0.548 n=5+5)
BM_LoadAdsDescriptor_Proto2<WithLayout> 71.6M ± 0% 71.6M ± 0% ~ (p=0.151 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Copy> 137k ± 0% 141k ± 0% +2.87% (p=0.008 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Alias> 125k ± 0% 128k ± 0% +2.83% (p=0.008 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 135k ± 0% 139k ± 0% +2.89% (p=0.008 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 124k ± 0% 127k ± 0% +2.85% (p=0.016 n=5+4)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 201k ± 0% 201k ± 0% ~ (p=0.222 n=5+5)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 107k ± 0% 107k ± 0% ~ (p=1.000 n=5+5)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 105k ± 0% 105k ± 0% ~ (p=0.286 n=5+4)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 86.5k ± 0% 86.5k ± 0% ~ (p=0.222 n=5+5)
BM_SerializeDescriptor_Proto2 60.3k ± 0% 60.3k ± 0% ~ (p=0.071 n=5+5)
BM_SerializeDescriptor_Upb 111k ± 0% 111k ± 0% ~ (p=0.841 n=5+5)
name old CYCLES/op new CYCLES/op delta
BM_ArenaOneAlloc 60.0 ± 7% 58.8 ± 0% -2.15% (p=0.016 n=5+5)
BM_ArenaInitialBlockOneAlloc 21.0 ± 0% 21.0 ± 0% ~ (p=1.000 n=5+5)
BM_LoadAdsDescriptor_Upb<NoLayout> 16.9M ± 0% 16.9M ± 0% ~ (p=0.056 n=5+5)
BM_LoadAdsDescriptor_Upb<WithLayout> 17.9M ± 1% 18.0M ± 1% ~ (p=0.095 n=5+5)
BM_LoadAdsDescriptor_Proto2<NoLayout> 35.9M ± 1% 35.8M ± 1% ~ (p=0.421 n=5+5)
BM_LoadAdsDescriptor_Proto2<WithLayout> 36.5M ± 0% 36.5M ± 0% ~ (p=0.841 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Copy> 37.2k ± 0% 37.3k ± 0% ~ (p=0.222 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Alias> 34.1k ± 0% 34.7k ± 0% +1.66% (p=0.008 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 36.4k ± 0% 36.7k ± 0% +0.83% (p=0.008 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 33.3k ± 1% 34.1k ± 1% +2.39% (p=0.008 n=5+5)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 68.1k ± 1% 68.0k ± 1% ~ (p=0.421 n=5+5)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 36.0k ± 1% 36.1k ± 1% ~ (p=0.841 n=5+5)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 35.3k ± 1% 35.5k ± 1% ~ (p=0.151 n=5+5)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 30.7k ± 0% 30.9k ± 1% ~ (p=0.151 n=5+5)
BM_SerializeDescriptor_Proto2 20.3k ± 2% 19.7k ± 3% ~ (p=0.151 n=5+5)
BM_SerializeDescriptor_Upb 33.6k ± 0% 33.7k ± 2% ~ (p=1.000 n=5+5)
name old allocs/op new allocs/op delta
BM_ArenaOneAlloc 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal)
BM_ArenaInitialBlockOneAlloc 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout> 6.00k ± 0% 6.00k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout> 5.99k ± 0% 5.99k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout> 77.8k ± 0% 77.8k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout> 79.0k ± 0% 79.0k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy> 7.19 ± 0% 7.19 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias> 7.19 ± 0% 7.19 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 10.2 ± 0% 10.2 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 1.19 ± 0% 1.19 ± 0% ~ (all samples are equal)
BM_SerializeDescriptor_Proto2 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal)
BM_SerializeDescriptor_Upb 0.19 ± 0% 0.19 ± 0% ~ (all samples are equal)
name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta
BM_ArenaOneAlloc 344 ± 0% 344 ± 0% ~ (all samples are equal)
BM_ArenaInitialBlockOneAlloc 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout> 9.64M ± 0% 9.64M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout> 9.70M ± 0% 9.70M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout> 6.38M ± 0% 6.38M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout> 6.44M ± 0% 6.44M ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 40.8k ± 0% 40.8k ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_SerializeDescriptor_Proto2 112 ± 0% 112 ± 0% ~ (all samples are equal)
BM_SerializeDescriptor_Upb 112 ± 0% 112 ± 0% ~ (all samples are equal)
name old speed new speed delta
BM_LoadAdsDescriptor_Upb<NoLayout> 147MB/s ± 2% 148MB/s ± 3% ~ (p=0.222 n=5+5)
BM_LoadAdsDescriptor_Upb<WithLayout> 137MB/s ± 3% 137MB/s ± 1% ~ (p=0.905 n=5+4)
BM_LoadAdsDescriptor_Proto2<NoLayout> 68.6MB/s ± 1% 69.3MB/s ± 1% +1.10% (p=0.016 n=5+4)
BM_LoadAdsDescriptor_Proto2<WithLayout> 66.0MB/s ± 9% 67.4MB/s ± 3% ~ (p=0.841 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Copy> 675MB/s ± 3% 667MB/s ± 3% ~ (p=0.222 n=5+5)
BM_Parse_Upb_FileDesc<UseArena, Alias> 730MB/s ± 5% 718MB/s ± 5% ~ (p=0.310 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 663MB/s ±16% 685MB/s ± 2% ~ (p=1.000 n=5+5)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 723MB/s ±15% 712MB/s ±16% ~ (p=0.421 n=5+5)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 367MB/s ± 2% 372MB/s ± 2% ~ (p=0.222 n=5+5)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 694MB/s ± 2% 691MB/s ± 4% ~ (p=0.841 n=5+5)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 714MB/s ± 3% 709MB/s ± 3% ~ (p=0.690 n=5+5)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 816MB/s ± 2% 816MB/s ± 3% ~ (p=1.000 n=5+5)
BM_SerializeDescriptor_Proto2 1.24GB/s ± 3% 1.28GB/s ± 3% ~ (p=0.222 n=5+5)
BM_SerializeDescriptor_Upb 734MB/s ± 3% 713MB/s ±13% ~ (p=0.841 n=5+5)
```
PiperOrigin-RevId: 477770562
2 years ago
Joshua Haberman
f44653bc52
Readability improvements for decode.c:
...
Several data arrays that are used in only one function have been moved from file scope to the function that uses them.
Updated to use C99 array designators so we can specify values by array index. This is more readable and less error-prone than using comments to label each value.
There is no functional change for the vast majority of this CL. The only exception is that we now test for OOM when creating a map.
PiperOrigin-RevId: 475191072
2 years ago
Joshua Haberman
e55bfa2851
Minor decoder refactor
...
Created a common function for saving/restoring the depth and checking end_group. This will be even more helpful if we decide to add any more state that is saved/resolved when we recurse into a sub-message.
This appears to be perf-neutral.
PiperOrigin-RevId: 475140169
2 years ago
Joshua Haberman
896e74c141
Optimizes `upb_MiniTable_Enum` for large but dense enums.
...
Optimizes `upb_MiniTable_Enum` for enums with many values (>64) but with relatively dense packing in numeric space.
This CL optimizes both the size and speed of such enums:
- size: 30x code size reduction
- speed: moved from linear search to a constant-time bit test
Negative enum values are still expensive, as they are never put into the bitfield.
PiperOrigin-RevId: 473259819
2 years ago
Joshua Haberman
ba7603b7c1
Updated decoder internal symbols to new naming scheme.
...
This is a naming change only, with no functional change. We did replace one inline function that had only one caller, but the net effect should be a no-op.
I did remove several `return` statements for calls to certain `noreturn` functions. These had no effect but were intended to improve readability. However the unused "return" caused ClangTidy to throw warnings, so these have been removed.
PiperOrigin-RevId: 472619191
2 years ago
Joshua Haberman
ececc21624
Fixed bug when parsing an unknown value in a proto2 enum extension. #fuzzing
...
Proto2 enum parsing is the only case where we have to look at the wire value (not merely the tag) to decide whether the field is known or unknown. If the value is unknown, we need to put the value in the Unknown Fields, but for an extension we no longer have easy access to the message, because for extensions we replace the `msg` pointer with a pointer to the extension. The bug occurred when we were treating the fake `upb_Message*` (which was actually a pointer to an extension) as a real `upb_Message*` that can have unknown fields.
This CL fixes the problem by preserving the true message pointer in `d->unknown_msg` when we are parsing an extension.
This also required fixing a bug in MiniTable building when fasttables are enabled. We need to set the table_mask to `-1` to disable fasttable parsing, not `0`.
For unknown reasons, this CL appears to speed up parsing somewhat significantly. Ideally we should be tracking parsing performance better over time, as it is possible this is merely regaining performance that was lost at a different time:
```
benchy --reference=srcfs third_party/upb/benchmarks:benchmark
10 / 10 [=================================================================================================================] 100.00% 2m32s
(Generated by http://go/benchy . Settings: --runs 5 --reference "srcfs")
name old cpu/op new cpu/op delta
BM_ArenaOneAlloc 23.9ns ± 6% 23.7ns ± 4% ~ (p=0.180 n=53+51)
BM_ArenaInitialBlockOneAlloc 7.62ns ± 4% 7.70ns ± 5% +0.99% (p=0.024 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout> 6.60ms ±10% 6.57ms ± 8% ~ (p=0.607 n=47+54)
BM_LoadAdsDescriptor_Upb<WithLayout> 6.92ms ± 5% 6.88ms ± 8% ~ (p=0.257 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout> 14.2ms ± 8% 14.0ms ± 7% -1.38% (p=0.025 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout> 14.3ms ± 8% 14.2ms ± 8% -1.16% (p=0.031 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy> 15.9µs ± 4% 14.6µs ± 4% -7.85% (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias> 14.5µs ± 4% 13.3µs ± 5% -8.50% (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 15.7µs ± 4% 14.4µs ± 5% -7.99% (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 14.2µs ± 5% 13.0µs ± 4% -8.56% (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 26.3µs ± 4% 26.2µs ± 4% ~ (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 13.3µs ± 5% 13.2µs ± 4% ~ (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.9µs ± 4% 12.8µs ± 3% -0.66% (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 10.9µs ± 6% 10.9µs ± 4% ~ (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2 7.57µs ± 6% 7.62µs ± 6% ~ (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb 12.8µs ± 4% 12.8µs ± 4% ~ (p=0.163 n=59+56)
name old time/op new time/op delta
BM_ArenaOneAlloc 23.9ns ± 5% 23.7ns ± 4% ~ (p=0.172 n=53+51)
BM_ArenaInitialBlockOneAlloc 7.62ns ± 4% 7.70ns ± 5% +1.02% (p=0.017 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout> 6.60ms ±10% 6.58ms ± 8% ~ (p=0.727 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout> 6.92ms ± 5% 6.88ms ± 8% ~ (p=0.260 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout> 14.2ms ± 7% 14.0ms ± 7% -1.40% (p=0.019 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout> 14.3ms ± 8% 14.2ms ± 8% -1.13% (p=0.037 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy> 15.9µs ± 4% 14.6µs ± 3% -7.88% (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias> 14.5µs ± 4% 13.3µs ± 5% -8.46% (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 15.7µs ± 4% 14.4µs ± 5% -7.99% (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 14.2µs ± 5% 13.0µs ± 4% -8.56% (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 26.3µs ± 4% 26.2µs ± 4% ~ (p=0.224 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 13.3µs ± 5% 13.2µs ± 4% ~ (p=0.098 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.9µs ± 4% 12.8µs ± 3% -0.68% (p=0.015 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 10.9µs ± 6% 10.9µs ± 4% ~ (p=0.052 n=59+58)
BM_SerializeDescriptor_Proto2 7.56µs ± 6% 7.62µs ± 6% ~ (p=0.111 n=58+58)
BM_SerializeDescriptor_Upb 12.8µs ± 4% 12.8µs ± 4% ~ (p=0.241 n=56+56)
name old allocs/op new allocs/op delta
BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
BM_ArenaInitialBlockOneAlloc 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout> 5.98k ± 0% 5.98k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout> 5.98k ± 0% 5.98k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout> 80.9k ± 0% 80.9k ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout> 82.1k ± 0% 82.1k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 9.00 ± 0% 9.00 ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_SerializeDescriptor_Proto2 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_SerializeDescriptor_Upb 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta
BM_ArenaOneAlloc 344 ± 0% 344 ± 0% ~ (all samples are equal)
BM_ArenaInitialBlockOneAlloc 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout> 9.60M ± 0% 9.60M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout> 9.68M ± 0% 9.68M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout> 6.41M ± 0% 6.41M ± 0% ~ (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout> 6.44M ± 0% 6.44M ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 40.7k ± 0% 40.7k ± 0% ~ (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_SerializeDescriptor_Proto2 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
BM_SerializeDescriptor_Upb 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
name old speed new speed delta
BM_LoadAdsDescriptor_Upb<NoLayout> 113MB/s ± 9% 113MB/s ± 8% ~ (p=0.712 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout> 107MB/s ± 8% 108MB/s ± 8% ~ (p=0.200 n=55+54)
BM_LoadAdsDescriptor_Proto2<NoLayout> 52.5MB/s ± 8% 53.3MB/s ± 7% +1.51% (p=0.018 n=59+59)
BM_LoadAdsDescriptor_Proto2<WithLayout> 51.9MB/s ± 7% 52.4MB/s ± 8% +1.01% (p=0.050 n=58+58)
BM_Parse_Upb_FileDesc<UseArena, Copy> 473MB/s ± 4% 514MB/s ± 4% +8.52% (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias> 518MB/s ± 4% 566MB/s ± 5% +9.30% (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy> 480MB/s ± 4% 521MB/s ± 5% +8.69% (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias> 528MB/s ± 4% 578MB/s ± 4% +9.36% (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy> 286MB/s ± 4% 287MB/s ± 4% ~ (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy> 566MB/s ± 5% 570MB/s ± 4% ~ (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 583MB/s ± 5% 587MB/s ± 3% +0.64% (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 688MB/s ± 6% 693MB/s ± 4% ~ (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2 995MB/s ± 6% 988MB/s ± 5% ~ (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb 586MB/s ± 4% 589MB/s ± 4% ~ (p=0.163 n=59+56)
```
PiperOrigin-RevId: 462022073
2 years ago
Protobuf Team Bot
8c44f04697
create and lock down upb/internal/array.h
...
Internal array functions are now implemented in upb/internal/array.c and declared in
upb/internal/array.h, which only has local visibility.
PiperOrigin-RevId: 458260144
2 years ago
Protobuf Team Bot
033859ff5d
rename internal/upb.h as internal/encode.h
...
add build target for upb/internal/encode.h and lock down its visibility
PiperOrigin-RevId: 457087638
2 years ago
Protobuf Team Bot
1695cb2788
rename the upb_Array 'len' field as 'size'
...
Now that 'size' has been renamed as 'capacity' we are free to rename 'len' as
'size', so upb_Array_Size() is actually returning the 'size' field.
PiperOrigin-RevId: 456865972
2 years ago
Protobuf Team Bot
83f4988561
rename the upb_Array 'size' field as 'capacity'
...
The current field/function names for upb_Array are quite confusing.
We will fix them in two steps, this being the first step.
PiperOrigin-RevId: 456687224
2 years ago
Protobuf Team Bot
e4635f223e
match file names to type names
...
Lots of changes but it's all just moving things around.
Backward-compatible stub #include's have been provided for now.
upb_Arena/upb_Status have been split out from upb/upb.?
upb_Array/upb_Map/upb_MessageValue have been split out from upb/collections.?
upb_ExtensionRegistry has been split out from upb/msg.?
upb/decode_internal.h is now upb/internal/decode.h
upb/mini_table_accessors_internal.h is now upb/internal/mini_table_accessors.h
upb/table_internal.h is now upb/internal/table.h
upb/upb_internal.h is now upb/internal/upb.h
PiperOrigin-RevId: 456297617
2 years ago
Joshua Haberman
d72d98495d
Fixed a Python test by adding a new map insert function that distinguishes between insert and update.
...
PiperOrigin-RevId: 447214618
3 years ago
Joshua Haberman
4978040db4
Rewrote the MessageSet parsing code in the upb decoder to properly handle several edge cases.
...
PiperOrigin-RevId: 442979459
3 years ago
Joshua Haberman
76c7ca9327
Updated API for accessing extensions.
...
PiperOrigin-RevId: 442947856
3 years ago
Protobuf Team
e1e7435e70
Internal change
...
PiperOrigin-RevId: 440796832
3 years ago
Joshua Haberman
9cc02bb60d
Rewrote the MessageSet parsing code in the upb decoder to properly handle several edge cases.
...
PiperOrigin-RevId: 440788402
3 years ago
Joshua Haberman
bcb08bf9f0
Clang-format.
3 years ago
Joshua Haberman
6509f13568
Reverted extra debug assignment.
3 years ago
Joshua Haberman
1046d778a2
Removed debug print statements.
3 years ago
Joshua Haberman
7d5f4cd9b6
Implemented the functionality to make the test pass.
3 years ago
theodorerose
97273a3638
WIP
3 years ago
Joshua Haberman
532dc1f0f0
Renamed a few more constants to the new style.
...
These are not in the public API and so were not prioritized before.
No functional change here, just renames.
3 years ago
Joshua Haberman
826eca6742
Enabled ubsan tests and fixed ubsan failures.
3 years ago
Joshua Haberman
606308c639
Added back missing underscore.
3 years ago
Joshua Haberman
75b6291e40
Renamed upb_FieldType_* -> kUpb_FieldType_*
3 years ago
Joshua Haberman
72af9dc0cc
Switch to a single upb_Decode.
3 years ago
Joshua Haberman
499c2cc8b1
upb_extreg, upb_msg
3 years ago
Joshua Haberman
1c955f37ce
Mass API rename and clang-reformat ( #485 )
...
* Wave 1: upb_fielddef.
* upb_fielddef itself.
* upb_oneofdef.
* upb_msgdef.
* ExtensionRange.
* upb_enumdef
* upb_enumvaldef
* upb_filedef
* upb_methoddef
* upb_servicedef
* upb_symtab
* upb_defpool_init
* upb_wellknown and upb_syntax_t
* Some constants.
* upb_status
* upb_strview
* upb_arena
* upb.h constants
* reflection
* encode
* JSON decode.
* json encode.
* msg_internal.
* Formatted with clang-format.
* Some naming fixups and comment reformatting.
* More refinements.
* A few more stragglers.
* Fixed PyObject_HEAD with semicolon. Removed TODO entries.
3 years ago