The upb convention is that "_Build()" means to also allocate, which this function does not do, so rename it as "_Init()" to free up the name for a future function that does allocate.
PiperOrigin-RevId: 510282736
Prior to this CL we were allocating a MiniTable for each message and then overwriting it later. This could lead to an inconsistent state, and is unnecessary. This CL adds an extra phase to initialization so that the MiniTable is assigned only one time for each message.
PiperOrigin-RevId: 507617479
The initial motivation for this change was to fix a bug found by fuzzing. The old fuzz test (built on `cc_fuzz_target()`) detected an infinite loop if a bytes field default has an unterminated `\x` escape.
To fix this bug while expanding fuzz coverage, I created a fuzz test that verifies that we can do a lossless round trip from descriptor -> DefPool -> descriptor. We use C++ as the source of truth for whether a descriptor is valid or not, and what the canonical serialization back to protobuf form should be.
I wrote the new fuzz test using go/FuzzTest, which makes it easier and more readable to use an arbitrary `FileDescriptorSet` as input, while adding test cases for regressions.
The fuzz test highlighted a handful of errors that I subsequently fixed and added regression tests for:
1. The aforementioned unterminated `\x` bug.
2. We were not propagating the `edition` field.
3. We were missing the CheckIdent() check in a few places.
4. We were rejecting files with empty name, whereas C++ allows this.
5. There were a few bugs with escaping string defaults.
Since FuzzTest is Clang-only, I split the `FUZZ_TEST()` invocation from the regression tests, since the latter are portable and should be run on all platforms. Only `FUZZ_TEST()` itself is in a google3/Clang-only file.
PiperOrigin-RevId: 506997362
Slight optimization that frees us from needing to backtrack up to the owning file def to extract the proto syntax bit. Costs zero additional storage since we already have available unused bits. Also makes the enum def the single source of truth for determining enum syntax - upb_FieldDef_IsClosedEnum() now just passes off to upb_EnumDef_IsClosed() instead of replicating that code.
PiperOrigin-RevId: 505513429
We have previously been using Copybara to rewrite these names, but for bootstrapping we will want to be able to sometimes use OSS names inside google3.
PiperOrigin-RevId: 500294974
upb_MiniTable_BuildEnum() -> upb_MiniTableEnum_Build()
upb_MiniTable_BuildExtension() -> upb_MiniTableExtension_Build()
also make the status pointer argument optional for the mini table builders
PiperOrigin-RevId: 490992866
Prior to this CL, there were several different code paths for reading/writing message data. Generated code, MiniTable accessors, and reflection all performed direct manipulation of the bits and bytes in a message, but they all had distinct implementations that did not share much of any code. This divergence meant that they could easily have different behavior, bugs could creep into one but not another, and we would need three different sets of tests to get full test coverage. This also made it very difficult to change the internal representation in any way, since it would require updating many places in the code.
With this CL, the three different APIs for accessing message data now all share a common set of functions. The common functions all take a `upb_MiniTableField` as the canonical description of a field's type and layout. The lowest-level functions are very branchy, as they must test for every possible variation in the field type (field vs oneof, hasbit vs no-hasbit, different field sizes, whether a nonzero default value exists, extension vs. regular field), however these functions are declared inline and designed to be very optimizable when values are known at compile time.
In generated accessors, for example, we can declare constant `upb_MiniTableField` instances so that all values can constant-propagate, and we can get fully specialized code even though we are calling a generic function. On the other hand, when we use the generic functions from reflection, we get runtime branches since values are not known at compile time. But even the function is written to still be as efficient as possible even when used from reflection. For example, we use memcpy() calls with constant length so that the compiler can optimize these into inline loads/stores without having to make an out-of-line call to memcpy().
In this way, this CL should be a benefit to both correctness and performance. It will also make it easier to change the message representation, for example to optimize the encoder by giving hasbits to all fields.
Note that we have not completely consolidated all access in this CL:
1. Some functions outside of get/set such as clear and hazzers are not yet unified.
2. The encoder and decoder still touch the message without going through the common functions. The encoder and decoder require a bit more specialized code to get good performance when reading/writing fields en masse.
PiperOrigin-RevId: 490016095
We need to sharpen the distinction between messages and extensions in the mini
descriptor encoder, so split the code paths for each.
PiperOrigin-RevId: 480675339
- Each def type has its own .c file and its own .h file
- Functions that require a builder context are declared in def_builder.h
- The mini descriptor encoders have also been pulled into upb/reflection/
- upb/def.h, upb/def.hpp, upb/reflection.h, and upb/reflection.hpp are now deprecated stubs that point to the new headers
PiperOrigin-RevId: 474459500