"No such field:" is more clear than "Unknown field:",
because "unknown field" is a term of art within protocol
buffers that implies that we are preserving the field.
Also "No such field:" matches the pre-existing Ruby
error message.
Unfortunately a few of the Clang warnings did not have easy fixes:
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’:
../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]
353 | const char *fastdecode_err(upb_decstate *d) {
| ^~~~~~~~~~~~~~
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’:
../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
867 | bool _upb_decode(const char *buf, size_t size, void *msg,
I even tried to suppress the first error, but it still shows up.
This matches an API already present in proto2
(const DescriptorPool* FileDescriptor::pool()).
However there is a slightly subtle implication here.
In proto2, the relationship between Descriptor and
MessageFactory is 1:many. You can create as many
DynamicMessageFactory instances as you want, and
each one will have its own independent DynamicMessage
prototype and computed layout for the same underlying
Descriptor. In practice the layouts will all be the same,
but one thing that could be distinct is that each can
have its own extension pool, which is a DescriptorPool
that will be searched for extensions when parsing.
In contrast, upb does not have a separate "message
factory" abstraction. That means that each upb_msgdef
has a single distinct layout, in other words a 1:1
correspondence between descriptor and layout. This means
that there is no way to create multiple message types
for the same descriptor that have distinct extension
pools. If you want a different set of extensions, you
must create a separate upb_symtab with a distinct set
of descriptors.
This change further entrenches that upb_filedef:upb_symtab
is a 1:1 relationship. A single upb_filedef cannot be a
member of multiple symbol tables. In practice this was
already true (there is no way to add a single filedef to
multiple symbol tables) but this change codifies this
1:1 relationship.
If an initial block is provided, we should start our
block doubling at the size of the initial block, not 128.
This saves us from unnecessary overhead when we overflow
the initial block.
There was a bug in our arena code where we assumed that
sizeof(upb_array) would be a multiple of 8. On i386 it was
not, and this was causing memory corruption on 32-bit builds.
We used to use a separate "add table" during the upb_symtab_addfile()
operation to make it easier to back out the file if it contained
errors. But this created unnecessary work of re-adding the same symbols
to the main symtab once everything was validated.
Instead we directly add symbols to the main symbols table. If there is
an error in validation, we remove precisely the set of symbols that
were already added.
This also requires using a separate arena for each file. We can fuse
it with the symtab's main arena if the operation is successful.
LoadDescriptor_Upb 61.2µs ± 4% 53.5µs ± 1% -12.50% (p=0.000 n=12+12)
LoadAdsDescriptor_Upb 4.43ms ± 1% 3.06ms ± 0% -31.00% (p=0.000 n=12+12)
LoadDescriptor_Proto2 257µs ± 0% 259µs ± 0% +1.00% (p=0.000 n=12+12)
LoadAdsDescriptor_Proto2 13.9ms ± 1% 13.9ms ± 1% ~ (p=0.128 n=12+12)