This CL changes the upb compiler to no longer depend on C++ protobuf libraries. upb now uses its own reflection libraries to implement its code generator.
# Key Benefits
1. upb can now use its own reflection libraries throughout the compiler. This makes upb more consistent and principled, and gives us more chances to dogfood our own C++ reflection API. This highlighted several parts of the C++ reflection API that were incomplete.
2. This CL removes code duplication that previously existed in the compiler. The upb reflection library has code to build MiniDescriptors and MiniTables out of descriptors, but prior to this CL the upb compiler could not use it. The upb compiler had a separate copy of this logic, and the compiler's copy of this logic was especially tricky and hard to maintain. This CL removes the separate copy of that logic.
3. This CL (mostly) removes upb's dependency on the C++ protobuf library. We still depend on `protoc` (the binary), but the runtime and compiler no longer link against C++'s libraries. This opens up the possibility of speeding up some builds significantly if we can use a prebuilt `protoc` binary.
# Bootstrap Stages
To bootstrap, we check in a copy of our generated code for `descriptor.proto` and `plugin.proto`. This allows the compiler to depend on the generated code for these two protos without creating a circular dependency. This code is checked in to the `stage0` directory.
The bootstrapping process is divided into a few stages. All `cc_library()`, `upb_proto_library()`, and `cc_binary()` targets that would otherwise be circular participate in this staging process. That currently includes:
* `//third_party/upb:descriptor_upb_proto`
* `//third_party/upb:plugin_upb_proto`
* `//third_party/upb:reflection`
* `//third_party/upb:reflection_internal`
* `//third_party/upbc:common`
* `//third_party/upbc:file_layout`
* `//third_party/upbc:plugin`
* `//third_party/upbc:protoc-gen-upb`
For each of these targets, we produce a rule for each stage (the logic for this is nicely encapsulated in Blaze/Bazel macros like `bootstrap_cc_library()` and `bootstrap_upb_proto_library()`, so the `BUILD` file remains readable). For example:
* `//third_party/upb:descriptor_upb_proto_stage0`
* `//third_party/upb:descriptor_upb_proto_stage1`
* `//third_party/upb:descriptor_upb_proto`
The stages are:
1. `stage0`: This uses the checked-in version of the generated code. The stage0 compiler is correct and outputs the same code as all other compilers, but it is unnecessarily slow because its protos were compiled in bootstrap mode. The stage0 compiler is used to generate protos for stage1.
2. `stage1`: The stage1 compiler is correct and fast, and therefore we use it in almost all cases (eg. `upb_proto_library()`). However its own protos were not generated using `upb_proto_library()`, so its `cc_library()` targets cannot be safely mixed with `upb_proto_library()`, as this would lead to duplicate symbols.
3. final (no stage): The final compiler is identical to the `stage1` compiler. The only difference is that its protos were built with `upb_proto_library()`. This doesn't matter very much for the compiler binary, but for the `cc_library()` targets like `//third_party/upb:reflection`, only the final targets can be safely linked in by other applications.
# "Bootstrap Mode" Protos
The checked-in generated code is generated in a special "bootstrap" mode that is a bit different than normal generated code. Bootstrap mode avoids depending on the internal representation of MiniTables or the messages, at the cost of slower runtime performance.
Bootstrap mode only interacts with MiniTables and messages using public APIs such as `upb_MiniTable_Build()`, `upb_Message_GetInt32()`, etc. This is very important as it allows us to change the internal representation without needing to regenerate our bootstrap protos. This will make it far easier to write CLs that change the internal representation, because it avoids the awkward dance of trying to regenerate the bootstrap protos when the compiler itself is broken due to bootstrap protos being out of date.
The bootstrap generated code does have two downsides:
1. The accessors are less efficient, because they look up MiniTable fields by number instead of hard-coding the MiniTableField into the generated code.
2. It requires runtime initialization of the MiniTables, which costs CPU cycles at startup, and also allocates memory which is never freed. Per google3 rules this is not really a leak, since this memory is still reachable via static variables, but it is undesirable in many contexts. We could fix this part by introducing the equivalent of `google::protobuf::ShutdownProtobufLibrary()`).
These downsides are fine for the bootstrapping process, but they are reason enough not to enable bootstrap mode in general for all protos.
# Bootstrapping Always Uses OSS Protos
To enable smooth syncing between Google3 and OSS, we always use an OSS version of the checked in generated code for `stage0`, even in google3.
This requires that the google3 code can be switched to reference the OSS proto names using a preprocessor define. We introduce the `UPB_DESC(xyz)` macro for this, which will expand into either `proto2_xyz` or `google_protobuf_xyz`. Any libraries used in `stage0` must use `UPB_DESC(xyz)` rather than refer to the symbol names directly.
PiperOrigin-RevId: 501458451
This leaves the decision of which C++ version to use up to our users. We still have a static_assert in port_def.inc that will prevent pre-C++14 usage.
PiperOrigin-RevId: 501351066
This simplifies the code generation by making output agnostic to whether fasttables will be used or not.
This grows the generated code in the common case, but when fasttables are not being used the preprocessor will strip away the unused tables.
PiperOrigin-RevId: 499340805
Addresses https://github.com/protocolbuffers/protobuf/issues/10936.
This requires updating to the newest version of rules_python to use the new py_wheel API that includes a parameter for extra distinfo files
PiperOrigin-RevId: 493060514
The current behavior will crash any Bazel command immediately, due to our declared pip dependencies in WORKSPACE, if python3 can't be found. The new behavior will mock out these workspace dependencies and allow any non-python targets to run. Python targets will be skipped by wildcard expressions if there's no system python3, and will fail when run directly, due to compatibility mismatch.
PiperOrigin-RevId: 492085254
I went ahead and deleted the update_file_list.sh script, because (a)
there was no good reason for it to be in a separate script and (b) we
now need to handle the well-known types in addition to file_lists.cmake.
With this change, we just invoke the staleness tests from the main
script to update everything.
While I was at it I made a couple small fixes:
- Don't skip the update step just because the previous commit was by
"Protobuf Team Bot". Copybara commits use this name and we still want
to do the auto-update step after them.
- Include the CL number in the description if the previous commit came
from a CL.
PiperOrigin-RevId: 487231324
This cl hit an issue during the shared library cmake build from ODR violations, leading to mismatched absl hash seeds. The problem was pre-existing but didn't manifest until now, and can be traced to the fact that in shared library builds we linked Abseil statically. All of the cmake changes here remove the underlying ODR violation.
PiperOrigin-RevId: 485787671
Once the GitHub action is fixed, these will be auto-updated whenever
necessary. I realized it would be a good idea to add them back first,
though, just to make sure nothing breaks before we enable the
auto-updating.
* Adding jsoncpp submodule
* Adding bazel dependency
* Hook up jsoncpp in Bazel builds
* Hook up jsoncpp dependency in CMake
* Fix conformance binary path
* Move jsoncpp import to the end of the file to avoid confusing add_test
This commit makes a couple changes to allow staleness_test() to be used
from outside the upb repo:
- Fully qualify references to upb targets and wrap them in a Label()
constructor. See here for details:
https://bazel.build/extending/macros#label-resolution
- Make the :staleness_test_lib target public.
This renaming is something we have been planning on doing, and I would
like to do it now because I'm getting ready to rely on this
staleness_test() macro from the main protobuf repo.
* Using glob to remove headers instead of cyclic file_lists
* Simplify CMake config and include missing files
* Don't remove generated proto headers
* Fix broken CMake proto dependencies instead of opting out pb.h files.
* Fixing cyclic dependency
This GitHub action will run after each pull request merge and will auto-update
the file lists in in src/file_lists.cmake. The action will run as our
bot account.
I realized that if a bug somehow made the file generation
non-idempotent, this could trigger an infinite loop of commits, so I put
in an extra safeguard against that. If the previous commit was by
"Protobuf Team Bot", the GitHub action will revert any local changes to
ensure that no new commit will be made.
* Use generated WKT code in Bazel builds
* Prefer src over external for genrule
* Prefer external over src for genrule
* Proper fix for windows proto path issues
descriptor.h includes abseil's mutex.h directly, thus abseil is not a
private dependency any more but must be included by every user of
libprotobuf.
Co-authored-by: Harald Fernengel <547273+haraldF@users.noreply.github.com>
* Sync from Piper @469587494
PROTOBUF_SYNC_PIPER
* Fixing github SOT protoc builds
* Fixing typos from google
* Remove leaked util/hash reference
* Fixing bad python merge
* Fixing python C++ library order
When using "FetchContent_Declare" with OVERRIDE_FIND_PACKAGE,
protobuf-config.cmake won't be used, thus the protobuf-generate macro
would be unavailable. By moving protobuf_generate to its own file, it
can be sourced and used even when using CMake's FetchContent.
Co-authored-by: Harald Fernengel <547273+haraldF@users.noreply.github.com>
* Fixing typos
* Revert new files that were deleted by sync script
* Fix CMake breakages
* bump upb version
* Sync from Piper @468772608
PROTOBUF_SYNC_PIPER
* Adding abseil to include path for python C++ extension
* Adding abseil linkage for python C++ extension
* Fixing linkage order
* Proof of concept for Abseil dependency
* Adding most common Abseil libraries
* Fixing shared library breakages
* Switching to quotes over angled brackets
* Disable install target by default
* Fixing abseil to LTS commit
* Upgrade to latest Abseil LTS
* Turning install back on by default, removing unnecessary export statements
* Add note to future self
* Fixing unsafe globals
* Update conformance test documentation
* Fix bug to allow custom argument forwarding to conformance tests
* Fix trailing sentence
* Add note about linux CMake support up top
* Add note about CMake being C++-only
* Bazelfying conformance tests
Adding infrastructure to "Bazelify" languages other than Java and C++
* Delete benchmarks for languages supported by other repositories
* Bazelfying benchmark tests
* Bazelfying python
Use upb's system python rule instead of branching tensorflow
* Bazelfying Ruby
* Bazelfying C#
* Bazelfying Objective-c
* Bazelfying Kokoro mac builds
* Bazelfying Kokoro linux builds
* Deleting all deprecated files from autotools cleanup
This boils down to Makefile.am and tests.sh and all of their remaining references
* Cleanup after PR reorganizing
- Enable 32 bit tests
- Move conformance tests back
- Use select statements to select alternate runtimes
- Add internal prefixes to proto library macros
* Updating READMEs to use bazel instead of autotools.
* Bazelfying Kokoro release builds
* First round of review fixes
* Second round of review fixes
* Third round of review fixes
* Filtering out conformance tests from Bazel on Windows (b/241484899)
* Add version metadata that was previously scraped from configure.ac
* fixing typo from previous fix
* Adding ruby version tests
* Bumping pinned upb version, and adding tests to python CI
* Initial implementation of cmake tests for linux
* Reverting accidental distcheck changes
* Updating gitignore file to include cmake generated files
* Updating existing cmake tests to use newer docker image
* Adding CMake build that uses Ninja as the generator
* Updating CMake documentation to cover Linux support
* Updating to latest cmake image
* Initial implementation of cmake tests for linux
* Reverting accidental distcheck changes
* Deleting extract_includes.bat now that it's been replaced with the generated file_lists.cmake
* Removing unnecessary endif conditions