The new API upb_MiniTable_Link() links all sub-messages and sub-enums at a single time, by accepting an array of sub-tables and sub-enums. The order of these sub-tables can be queried using a separate function `upb_MiniTable_GetSubList()`, and this information is added to `CodeGeneratorRequest` as part of the upb-specific info.
PiperOrigin-RevId: 513970874
This fixes an MSAN warning of the form:
```
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc964f2a597 in arena_findroot third_party/upb/upb/mem/arena.c:64:3
#1 0x7fc964f2a597 in upb_Arena_Free third_party/upb/upb/mem/arena.c:211:7
#2 0x7fc9d2af0028 in std::__msan::unique_ptr<upb_Arena, void (*)(upb_Arena*)>::~unique_ptr() third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/unique_ptr.h:277:7
#3 0x7fc9d2aef7a2 in ~Arena third_party/upb/upb/upb.hpp:70:7
#4 0x7fc9d2aef7a2 in ~InlinedArena third_party/upb/upb/upb.hpp:97:7
#5 0x7fc9d2aef7a2 in Cpp_InlinedArena2_Test::TestBody() third_party/upb/upb/test/test_cpp.cc:187:1
#6 0x7fc97da78a57 in testing::Test::Run() third_party/googletest/googletest/src/gtest.cc:2695:5
#7 0x7fc97da7a3e8 in testing::TestInfo::Run() third_party/googletest/googletest/src/gtest.cc:2844:11
#8 0x7fc97da7b897 in testing::TestSuite::Run() third_party/googletest/googletest/src/gtest.cc:3003:30
#9 0x7fc97daa5136 in testing::internal::UnitTestImpl::RunAllTests() third_party/googletest/googletest/src/gtest.cc:5899:44
#10 0x7fc97daa455c in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> third_party/googletest/googletest/src/gtest.cc
#11 0x7fc97daa455c in testing::UnitTest::Run() third_party/googletest/googletest/src/gtest.cc:5464:10
#12 0x562a7fb876f0 in RUN_ALL_TESTS third_party/googletest/googletest/include/gtest/gtest.h:2329:73
#13 0x562a7fb876f0 in main testing/base/internal/gunit_main.cc:86:10
#14 0x7fc9ba9b7632 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x61632) (BuildId: 280088eab084c30a3992a9bce5c35b44)
#15 0x562a7fafdbe9 in _start /build/work/ab393f4ac612f9027aae6b1a7226027ba2a2/google3/blaze-out/k8-opt/bin/third_party/grte/v5_src/grte-scratch/BUILD/src/csu/../sysdeps/x86_64/start.S:120
Member fields were destroyed
#0 0x562a7fb0b13d in __sanitizer_dtor_callback_fields third_party/llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:960:5
#1 0x7fc9d2aef79a in ~InlinedArena third_party/upb/upb/upb.hpp:105:8
#2 0x7fc9d2aef79a in ~InlinedArena third_party/upb/upb/upb.hpp:97:7
#3 0x7fc9d2aef79a in Cpp_InlinedArena2_Test::TestBody() third_party/upb/upb/test/test_cpp.cc:187:1
#4 0x7fc97da78a57 in testing::Test::Run() third_party/googletest/googletest/src/gtest.cc:2695:5
#5 0x7fc97da7a3e8 in testing::TestInfo::Run() third_party/googletest/googletest/src/gtest.cc:2844:11
#6 0x7fc97da7b897 in testing::TestSuite::Run() third_party/googletest/googletest/src/gtest.cc:3003:30
#7 0x7fc97daa5136 in testing::internal::UnitTestImpl::RunAllTests() third_party/googletest/googletest/src/gtest.cc:5899:44
#8 0x7fc97daa455c in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> third_party/googletest/googletest/src/gtest.cc
#9 0x7fc97daa455c in testing::UnitTest::Run() third_party/googletest/googletest/src/gtest.cc:5464:10
#10 0x562a7fb876f0 in RUN_ALL_TESTS third_party/googletest/googletest/include/gtest/gtest.h:2329:73
#11 0x562a7fb876f0 in main testing/base/internal/gunit_main.cc:86:10
#12 0x7fc9ba9b7632 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x61632) (BuildId: 280088eab084c30a3992a9bce5c35b44)
#13 0x562a7fafdbe9 in _start /build/work/ab393f4ac612f9027aae6b1a7226027ba2a2/google3/blaze-out/k8-opt/bin/third_party/grte/v5_src/grte-scratch/BUILD/src/csu/../sysdeps/x86_64/start.S:120
```
PiperOrigin-RevId: 511849224
This was not a bug -- the previous behavior was correct. However this
change brings our implementation-specific details in line with C++.
PiperOrigin-RevId: 511273853
The upb convention is that "_Build()" means to also allocate, which this function does not do, so rename it as "_Init()" to free up the name for a future function that does allocate.
PiperOrigin-RevId: 510282736
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part II: Mini table decoder.
PiperOrigin-RevId: 509644446
Prior to this CL we were allocating a MiniTable for each message and then overwriting it later. This could lead to an inconsistent state, and is unnecessary. This CL adds an extra phase to initialization so that the MiniTable is assigned only one time for each message.
PiperOrigin-RevId: 507617479
Previously we were calling the static method `Equals()` which did not take into effect the custom comparator or string for reporting errors.
PiperOrigin-RevId: 507344425
The initial motivation for this change was to fix a bug found by fuzzing. The old fuzz test (built on `cc_fuzz_target()`) detected an infinite loop if a bytes field default has an unterminated `\x` escape.
To fix this bug while expanding fuzz coverage, I created a fuzz test that verifies that we can do a lossless round trip from descriptor -> DefPool -> descriptor. We use C++ as the source of truth for whether a descriptor is valid or not, and what the canonical serialization back to protobuf form should be.
I wrote the new fuzz test using go/FuzzTest, which makes it easier and more readable to use an arbitrary `FileDescriptorSet` as input, while adding test cases for regressions.
The fuzz test highlighted a handful of errors that I subsequently fixed and added regression tests for:
1. The aforementioned unterminated `\x` bug.
2. We were not propagating the `edition` field.
3. We were missing the CheckIdent() check in a few places.
4. We were rejecting files with empty name, whereas C++ allows this.
5. There were a few bugs with escaping string defaults.
Since FuzzTest is Clang-only, I split the `FUZZ_TEST()` invocation from the regression tests, since the latter are portable and should be run on all platforms. Only `FUZZ_TEST()` itself is in a google3/Clang-only file.
PiperOrigin-RevId: 506997362
Slight optimization that frees us from needing to backtrack up to the owning file def to extract the proto syntax bit. Costs zero additional storage since we already have available unused bits. Also makes the enum def the single source of truth for determining enum syntax - upb_FieldDef_IsClosedEnum() now just passes off to upb_EnumDef_IsClosed() instead of replicating that code.
PiperOrigin-RevId: 505513429
To correctly handle this case we must add the serialized map entry to the unknown fields. Ideally we could merely preserve the map entry's serialized bytes from the input. However this is tricky to do if we are streaming and the previous buffer where the map entry began is no longer available.
This CL fixes this edge case by using the encoder to re-encode the map entry rather than using the input bytes directly.
While this fix is reasonably simple and reliable, it has two unfortunate properties. One is performance: we now must run the encoder to recreate bytes that we already saw in the input.
The other is dependencies: this fix has the unfortunate property of making the decoder depend on the encoder. In applications that only want the decoder but not the encoder, this will increase binary size. But the practical effects of this are probably minimal (the vast majority of applications that depend on the decoder will also use the encoder).
We can revisit this later and see if there is a better way of preserving the input bytes without re-encoding. For now this fix is simple and correct and fixes the fuzz bug.
PiperOrigin-RevId: 505381927
Currently these functions are hardwired to always return true, but the upstream
code now checks for failures (which will be implemented soon).
PiperOrigin-RevId: 504943663
The initial motivation for this CL was to fix a bug found by fuzzing. But the fuzz bug pointed out a few edge cases that this CL corrects:
1. The core bug is that we were allowing a map entry sub-message to be linked to a group field. This is not allowed in protobuf schemas, but we did not check for this edge case in `upb_MiniTable_SetSubMessage()`, so we were de facto allowing it. This triggered some bad behavior in the parser whereby we pushed a limit without checking its validity first.
2. To defend against this, I added asserts in `upb_MiniTable_SetSubMessage()` to verify the type of the field we are linking, to ensure that a group field is not linked to a map entry sub-message. But this should probably be changed to return an error instead of relying on asserts for this.
3. I changed the fuzz util code that builds the MiniTable so that it will never violate this new invariant. The fuzz util code now can run into situations where a group field has no valid non-map-entry sub-message to select. In those cases it will simply not register any sub-message for that field.
4. Previously group did not support leaving sub-messages unregistered. Previously I added this feature for sub-messages but not for groups. There is no reason why dynamic tree shaking should not work for group fields, so I extended the feature to support groups also.
PiperOrigin-RevId: 504913630
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part V: Definition protocol buffer converters.
PiperOrigin-RevId: 504817971
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part III: Definition pool builders.
PiperOrigin-RevId: 504817954
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part VI: Wire encoder/decoder.
PiperOrigin-RevId: 504810940
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part IV: Comparison utility.
PiperOrigin-RevId: 504563703
According to https://en.cppreference.com/w/c/program/setjmp automatic variables
modified in a function calling setjmp can have indeterminate values. Instead,
refactor all functions calling setjmp so that the function calling setjmp
doesn’t have any local variables.
Part I: JSON encoder/decoder. The code was previously in compliance, but a
convention of avoiding non-const local variables in functions calling setjmp
will make the compliance more obvious.
PiperOrigin-RevId: 502927863
The offsetof trick is not compatible with an incoming Clang change.
Clang has started to enforce that it is UB to declare types inside offsetof. See e327b52766.
```
third_party/upb/upb/mem/arena.c:166:25: error: 'struct (unnamed at third_party/upb/upb/mem/arena.c:166:7)' cannot be defined in '__builtin_offsetof'
n = UPB_ALIGN_DOWN(n, UPB_ALIGN_OF(upb_Arena));
^
./third_party/upb/upb/port/def.inc:113:38: note: expanded from macro 'UPB_ALIGN_OF'
#define UPB_ALIGN_OF(type) offsetof (struct { char c; type member; }, member)
^
third_party/upb/upb/mem/arena.c:166:25: error: 'struct (unnamed at third_party/upb/upb/mem/arena.c:166:7)' cannot be defined in '__builtin_offsetof'
./third_party/upb/upb/port/def.inc:113:38: note: expanded from macro 'UPB_ALIGN_OF'
#define UPB_ALIGN_OF(type) offsetof (struct { char c; type member; }, member)
```
PiperOrigin-RevId: 501872556