This will allow us to run the CMake test from within the protobuf repo's Bazel
workspace. The directory structure is a little bit different depending on which
workspace the test is invoked in.
PiperOrigin-RevId: 557275295
This simplifies the CMake code to ask for the minimum required version and also allow newer policies.
It also uses `target_include_directories()` to set the header search path in each library (and their downstream dependencies). Using `include_directories()` is not idiomatic in CMake >= 3.0. It sets the include path for all targets and one may need to have a few targets with a different search path.
PiperOrigin-RevId: 538450541
This CL changes the upb compiler to no longer depend on C++ protobuf libraries. upb now uses its own reflection libraries to implement its code generator.
# Key Benefits
1. upb can now use its own reflection libraries throughout the compiler. This makes upb more consistent and principled, and gives us more chances to dogfood our own C++ reflection API. This highlighted several parts of the C++ reflection API that were incomplete.
2. This CL removes code duplication that previously existed in the compiler. The upb reflection library has code to build MiniDescriptors and MiniTables out of descriptors, but prior to this CL the upb compiler could not use it. The upb compiler had a separate copy of this logic, and the compiler's copy of this logic was especially tricky and hard to maintain. This CL removes the separate copy of that logic.
3. This CL (mostly) removes upb's dependency on the C++ protobuf library. We still depend on `protoc` (the binary), but the runtime and compiler no longer link against C++'s libraries. This opens up the possibility of speeding up some builds significantly if we can use a prebuilt `protoc` binary.
# Bootstrap Stages
To bootstrap, we check in a copy of our generated code for `descriptor.proto` and `plugin.proto`. This allows the compiler to depend on the generated code for these two protos without creating a circular dependency. This code is checked in to the `stage0` directory.
The bootstrapping process is divided into a few stages. All `cc_library()`, `upb_proto_library()`, and `cc_binary()` targets that would otherwise be circular participate in this staging process. That currently includes:
* `//third_party/upb:descriptor_upb_proto`
* `//third_party/upb:plugin_upb_proto`
* `//third_party/upb:reflection`
* `//third_party/upb:reflection_internal`
* `//third_party/upbc:common`
* `//third_party/upbc:file_layout`
* `//third_party/upbc:plugin`
* `//third_party/upbc:protoc-gen-upb`
For each of these targets, we produce a rule for each stage (the logic for this is nicely encapsulated in Blaze/Bazel macros like `bootstrap_cc_library()` and `bootstrap_upb_proto_library()`, so the `BUILD` file remains readable). For example:
* `//third_party/upb:descriptor_upb_proto_stage0`
* `//third_party/upb:descriptor_upb_proto_stage1`
* `//third_party/upb:descriptor_upb_proto`
The stages are:
1. `stage0`: This uses the checked-in version of the generated code. The stage0 compiler is correct and outputs the same code as all other compilers, but it is unnecessarily slow because its protos were compiled in bootstrap mode. The stage0 compiler is used to generate protos for stage1.
2. `stage1`: The stage1 compiler is correct and fast, and therefore we use it in almost all cases (eg. `upb_proto_library()`). However its own protos were not generated using `upb_proto_library()`, so its `cc_library()` targets cannot be safely mixed with `upb_proto_library()`, as this would lead to duplicate symbols.
3. final (no stage): The final compiler is identical to the `stage1` compiler. The only difference is that its protos were built with `upb_proto_library()`. This doesn't matter very much for the compiler binary, but for the `cc_library()` targets like `//third_party/upb:reflection`, only the final targets can be safely linked in by other applications.
# "Bootstrap Mode" Protos
The checked-in generated code is generated in a special "bootstrap" mode that is a bit different than normal generated code. Bootstrap mode avoids depending on the internal representation of MiniTables or the messages, at the cost of slower runtime performance.
Bootstrap mode only interacts with MiniTables and messages using public APIs such as `upb_MiniTable_Build()`, `upb_Message_GetInt32()`, etc. This is very important as it allows us to change the internal representation without needing to regenerate our bootstrap protos. This will make it far easier to write CLs that change the internal representation, because it avoids the awkward dance of trying to regenerate the bootstrap protos when the compiler itself is broken due to bootstrap protos being out of date.
The bootstrap generated code does have two downsides:
1. The accessors are less efficient, because they look up MiniTable fields by number instead of hard-coding the MiniTableField into the generated code.
2. It requires runtime initialization of the MiniTables, which costs CPU cycles at startup, and also allocates memory which is never freed. Per google3 rules this is not really a leak, since this memory is still reachable via static variables, but it is undesirable in many contexts. We could fix this part by introducing the equivalent of `google::protobuf::ShutdownProtobufLibrary()`).
These downsides are fine for the bootstrapping process, but they are reason enough not to enable bootstrap mode in general for all protos.
# Bootstrapping Always Uses OSS Protos
To enable smooth syncing between Google3 and OSS, we always use an OSS version of the checked in generated code for `stage0`, even in google3.
This requires that the google3 code can be switched to reference the OSS proto names using a preprocessor define. We introduce the `UPB_DESC(xyz)` macro for this, which will expand into either `proto2_xyz` or `google_protobuf_xyz`. Any libraries used in `stage0` must use `UPB_DESC(xyz)` rather than refer to the symbol names directly.
PiperOrigin-RevId: 501458451
This simplifies the code generation by making output agnostic to whether fasttables will be used or not.
This grows the generated code in the common case, but when fasttables are not being used the preprocessor will strip away the unused tables.
PiperOrigin-RevId: 499340805
Addresses https://github.com/protocolbuffers/protobuf/issues/10936.
This requires updating to the newest version of rules_python to use the new py_wheel API that includes a parameter for extra distinfo files
PiperOrigin-RevId: 493060514
The current behavior will crash any Bazel command immediately, due to our declared pip dependencies in WORKSPACE, if python3 can't be found. The new behavior will mock out these workspace dependencies and allow any non-python targets to run. Python targets will be skipped by wildcard expressions if there's no system python3, and will fail when run directly, due to compatibility mismatch.
PiperOrigin-RevId: 492085254
This renaming is something we have been planning on doing, and I would
like to do it now because I'm getting ready to rely on this
staleness_test() macro from the main protobuf repo.
From now on, these files will live in the "generated" branch only, and a GitHub action will regenerate these files whenever there is a commit to the main branch.
PiperOrigin-RevId: 438879338
upb previously attempted to support C89 and pre-2015 versions
of Visual Studio. This was to support older compilers with
limited C99 support (particularly MSVC). But as of last August,
even gRPC has dropped support for MSVC prior to 2015
c87276d058
Therefore it seems safe for upb to no longer attempt C89 support
(we were already not truly C89 compliant, with our use of "bool").
We now explicitly require C99 or greater and MSVC 2015 or greater.
This cleaned up port_def.inc a fair bit. I took the chance to
also remove some obsolete macros.
- A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders.
# Schema/Defs Changes
- Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum.
- Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type.
# Decoder
- Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage.
# JSON Decoder
- Allowed well-known types at the top level to have their special processing.
- Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`.
- Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0.
# JSON Encoder
- Allowed well-known types at the top level to have their special processing.
- Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests.
# Message / Reflection
- Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set.
- Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`.
- Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm.
# Compiler
- Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.
* Created amalgamation with upb_msg but no handlers.
* Bugfix for upb_array_resize().
* Renamed "lite" amalgamation to "core", to avoid confusion.
Traditionally "lite" has meant "without reflection", but here we
mean it as "without handlers-based code."
* Build fixes from CI tests.
* Removed some more C++-style comments.
* Fix for out-of-order statements.