This makes extension checking more strict in most cases.
However it also fixes a bug with MessageSet where we were being
too strict. MessageSet allows larger extension numbers than
normal extensions do.
In a big-endian system, the 64-bit value of 1 is represented as:
```
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1
```
However, when `d.int32_val` is used, this truncates this and takes the
first four bytes:
```
0x0 0x0 0x0 0x0
```
As a result, we lose the value of 1 from this truncation and the value
beocmes 0. This doesn't happen in a little-endian system because the 1
is in the lowest memory address, so truncating the value to 32 bits
doesn't change anything.
Previously the DefToProto test was failing on a big-endian system
because this truncation caused the key to be incorrectly set to 0.
We now use the type-specific functions
(e.g. `upb_fielddef_defaultint32`) to do this conversion.
Closes https://github.com/protocolbuffers/upb/issues/442
The primary motivation for this change is to avoid referring to the
`upb_msglayout` object when we are trying to fetch the `upb_msglayout`
object for a sub-message. This will help pave the way for parsing
extensions. We also implement several optimizations so that we can
make this change without regressing performance.
Normally we compute the layout for a sub-message field like so:
```
const upb_msglayout *get_submsg_layout(
const upb_msglayout *layout,
const upb_msglayout_field *field) {
return layout->submsgs[field->submsg_index]
}
```
The reason for this indirection is to avoid storing a pointer directly
in `upb_msglayout_field`, as this would double its size (from 12 to 24
bytes on 64-bit architectures) which is wasteful as this pointer is
only needed for message typed fields.
However `get_submsg_layout` as written above does not work for
extensions, as they will not have entries in the message's
`layout->submsgs` array by nature, and we want to avoid creating
an entire fake `upb_msglayout` for each such extension since that
would also be wasteful.
This change removes the dependency on `upb_msglayout` by passing down
the `submsgs` array instead:
```
const upb_msglayout *get_submsg_layout(
const upb_msglayout *const *submsgs,
const upb_msglayout_field *field) {
return submsgs[field->submsg_index]
}
```
This will pave the way for parsing extensions, as we can more easily
create an alternative `submsgs` array for extension fields without
extra overhead or waste.
Along the way several optimizations presented themselves that allow
a nice increase in performance:
1. Passing the parsed `wireval` by address instead of by value ended
up avoiding an expensive and useless stack copy (this is on Clang,
which was used for all measurements).
2. When field numbers are densely packed, we can find a field by number
with a single indexed lookup instead of linear search. At codegen
time we can compute the maximum field number that will allow such
an indexed lookup.
3. For fields that do require linear search, we can start the linear
search at the location where we found the previous field, taking
advantage of the fact that field numbers are generally increasing.
4. When the hasbit index is less than 32 (the common case) we can use
a less expensive code sequence to set it.
5. We check for the hasbit case before the oneof case, as optional
fields are more common than oneof fields.
Benchmark results indicate a 20% improvement in parse speed with a
small code size increase:
```
name old time/op new time/op delta
ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10)
LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12)
LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12)
LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12)
LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10)
Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12)
Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11)
Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12)
Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12)
Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12)
Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11)
Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10)
Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11)
SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11)
SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+1.5% +176 +1.6% +176 upb/decode.c
+1.8% +176 +1.9% +176 decode_msg
+0.4% +64 +0.4% +64 upb/def.c
+1.4% +64 +1.4% +64 _upb_symtab_addfile
+1.2% +48 +1.4% +48 upb/reflection.c
+15% +32 +18% +32 upb_msg_set
+2.9% +16 +3.1% +16 upb_msg_mutable
-9.3% -288 [ = ] 0 [Unmapped]
[ = ] 0 +0.2% +288 TOTAL
```
This brings upb into line with C++. PHP already checks this
internally, so this should not be an issue there. Ruby on the
other hand does not currently check this, so this change will
cause our Ruby implementation to reject some programs that
would otherwise have been accepted.
Unfortunately a few of the Clang warnings did not have easy fixes:
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’:
../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]
353 | const char *fastdecode_err(upb_decstate *d) {
| ^~~~~~~~~~~~~~
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’:
../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
867 | bool _upb_decode(const char *buf, size_t size, void *msg,
I even tried to suppress the first error, but it still shows up.
This matches an API already present in proto2
(const DescriptorPool* FileDescriptor::pool()).
However there is a slightly subtle implication here.
In proto2, the relationship between Descriptor and
MessageFactory is 1:many. You can create as many
DynamicMessageFactory instances as you want, and
each one will have its own independent DynamicMessage
prototype and computed layout for the same underlying
Descriptor. In practice the layouts will all be the same,
but one thing that could be distinct is that each can
have its own extension pool, which is a DescriptorPool
that will be searched for extensions when parsing.
In contrast, upb does not have a separate "message
factory" abstraction. That means that each upb_msgdef
has a single distinct layout, in other words a 1:1
correspondence between descriptor and layout. This means
that there is no way to create multiple message types
for the same descriptor that have distinct extension
pools. If you want a different set of extensions, you
must create a separate upb_symtab with a distinct set
of descriptors.
This change further entrenches that upb_filedef:upb_symtab
is a 1:1 relationship. A single upb_filedef cannot be a
member of multiple symbol tables. In practice this was
already true (there is no way to add a single filedef to
multiple symbol tables) but this change codifies this
1:1 relationship.
We used to use a separate "add table" during the upb_symtab_addfile()
operation to make it easier to back out the file if it contained
errors. But this created unnecessary work of re-adding the same symbols
to the main symtab once everything was validated.
Instead we directly add symbols to the main symbols table. If there is
an error in validation, we remove precisely the set of symbols that
were already added.
This also requires using a separate arena for each file. We can fuse
it with the symtab's main arena if the operation is successful.
LoadDescriptor_Upb 61.2µs ± 4% 53.5µs ± 1% -12.50% (p=0.000 n=12+12)
LoadAdsDescriptor_Upb 4.43ms ± 1% 3.06ms ± 0% -31.00% (p=0.000 n=12+12)
LoadDescriptor_Proto2 257µs ± 0% 259µs ± 0% +1.00% (p=0.000 n=12+12)
LoadAdsDescriptor_Proto2 13.9ms ± 1% 13.9ms ± 1% ~ (p=0.128 n=12+12)