[Bazel] Add a doc explaining our C++ build systems and distribution archives. (#10073)
parent
74b2c7c7ce
commit
bd85edfbd9
1 changed files with 344 additions and 0 deletions
@ -0,0 +1,344 @@ |
||||
# How Protobuf supports multiple C++ build systems |
||||
|
||||
This document explains how the Protobuf project supports multiple C++ build |
||||
systems. |
||||
|
||||
## Background |
||||
|
||||
Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++ |
||||
runtime and Protobuf compiler[^historical_sot]. However, there are several |
||||
different build systems in common use for C++, each one of which requires |
||||
essentially a complete copy of the same build definitions. |
||||
|
||||
[^historical_sot]: |
||||
On a historical note, prior to its [release as Open Source |
||||
Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html), |
||||
the Protobuf project was developed using Google's internal build system, which |
||||
was the predecessor to Bazel (the vast majority of Google's contributions |
||||
continue to be developed this way). The Open Source Protobuf project, however, |
||||
historically used Autoconf to build the C++ implementation. |
||||
Over time, other build systems (including Bazel) have been added, thanks in |
||||
large part to substantial contributions from the Open Source community. Since |
||||
the Protobuf project deals with multiple languages (all of which ultimately |
||||
rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a |
||||
project-wide build system -- in fact, Bazel (and its predecessor, Blaze) |
||||
was designed in large part to support exactly this type of rich, |
||||
multi-language build. |
||||
|
||||
Currently, C++ Protobuf can be built with Bazel, Autotools, and CMake. Each of |
||||
these build systems has different semantics and structure, but share in common |
||||
the list of files needed to build the runtime and compiler. |
||||
|
||||
## Design |
||||
|
||||
### Extracting information from Bazel |
||||
|
||||
Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to |
||||
traverse the build graph, inspect build rules, define additional actions, and |
||||
expose information through |
||||
[providers](https://bazel.build/rules/rules#providers). For example, the |
||||
`cc_proto_library` rule uses an aspect to traverse the dependency graph of |
||||
`proto_library` rules, and dynamically attaches actions to generate C++ code |
||||
using the Protobuf compiler and compile using the C++ compiler. |
||||
|
||||
In order to support multiple build systems, the overall build structure is |
||||
defined once for each system, and expose frequently-changing metadata |
||||
from Bazel in a way that can be included from the build definition. Primarily, |
||||
this means exposing the list of source files in a way that can be included |
||||
in other build definitions. |
||||
|
||||
Two aspects are used to extract this information from the Bazel build |
||||
definitions: |
||||
|
||||
* `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build |
||||
rules like `cc_library`. The sources are exposed through a provider named |
||||
`CcFileList`. |
||||
* `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and |
||||
also generates the expected filenames that would be generated by the |
||||
Protobuf compiler. This information is exposed through a provider named |
||||
`ProtoFileList`. |
||||
|
||||
On their own, these aspects have limited utility. However, they can be |
||||
instantiated by custom rules, so that an ordinary `BUILD.bazel` target can |
||||
produce outputs based on the information gleaned from these aspects. |
||||
|
||||
### (Aside) Distribution libraries |
||||
|
||||
Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so |
||||
that, for example, lightweight unit tests can be written with narrow scope. |
||||
Although Bazel does build library artifacts (such as `.so` and `.a` files on |
||||
Linux), they correspond to `cc_library` rules. |
||||
|
||||
Since the entire "Protobuf library" includes many constituent `cc_library` |
||||
rules, a special rule, `cc_dist_library`, combines several fine-grained |
||||
libraries into a single, monolithic library. |
||||
|
||||
For the Protobuf project, these "distribution libraries" are intended to match |
||||
the granularity of the Autotools- and CMake-based builds. Since the Bazel-built |
||||
distribution library covers the rules with the source files needed by other |
||||
builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its |
||||
input libraries. The result is that a `cc_dist_library` rule not only produces |
||||
composite library artifacts, but also collect and provide the list of sources |
||||
that were inputs. |
||||
|
||||
For example: |
||||
|
||||
``` |
||||
$ cat cc_dist_library_example/BUILD.bazel |
||||
load("@rules_cc//cc:defs.bzl", "cc_library") |
||||
load("//pkg:cc_dist_library.bzl", "cc_dist_library") |
||||
|
||||
cc_library( |
||||
name = "a", |
||||
srcs = ["a.cc"], |
||||
) |
||||
|
||||
cc_library( |
||||
name = "b", |
||||
srcs = ["b.cc"], |
||||
deps = [":c"], |
||||
) |
||||
|
||||
# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b': |
||||
cc_library( |
||||
name = "c", |
||||
srcs = ["c.cc"], |
||||
) |
||||
|
||||
cc_dist_library( |
||||
name = "lib", |
||||
deps = [ |
||||
":a", |
||||
":b", |
||||
], |
||||
visbility = ["//visibility:public"], |
||||
) |
||||
|
||||
# Note: the output below has been formatted for clarity: |
||||
$ bazel cquery //cc_dist_library_example:lib \ |
||||
--output=starlark \ |
||||
--starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]' |
||||
struct( |
||||
hdrs = depset([]), |
||||
internal_hdrs = depset([]), |
||||
srcs = depset([ |
||||
<source file cc_dist_library_example/a.cc>, |
||||
<source file cc_dist_library_example/b.cc>, |
||||
]), |
||||
textual_hdrs = depset([]), |
||||
) |
||||
``` |
||||
|
||||
The upshot is that the "coarse-grained" library can be defined by the Bazel |
||||
build, and then export the list of source files that are needed to reproduce the |
||||
library in a different build system. |
||||
|
||||
One major difference from most Bazel rule types is that the file list aspects do |
||||
not propagate. In other words, they only expose the immediate dependency's |
||||
sources, not transitive sources. This is for two reasons: |
||||
|
||||
1. Immediate dependencies are conceptually simple, while transitivity requires |
||||
substantially more thought. For example, if transitive dependencies were |
||||
considered, then some way would be needed to exclude dependencies that |
||||
should not be part of the final library (for example, a distribution library |
||||
for `//:protobuf` could be defined not to include all of |
||||
`//:protobuf_lite`). While dependency elision is an interesting design |
||||
problem, the protobuf library is small enough that directly listing |
||||
dependencies should not be problematic. |
||||
2. Dealing only with immediate dependencies gives finer-grained control over |
||||
what goes into the composite library. For example, a Starlark `select()` |
||||
could conditionally add fine-grained libraries to some builds, but not |
||||
others. |
||||
|
||||
Another subtlety for tests is due to Bazel internals. Internally, a slightly |
||||
different configuration is used when evaluating `cc_test` rules as compared to |
||||
`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library` |
||||
rule, and both are evaluated by Bazel, this can result in a build-time error: |
||||
the config used for the test contains additional options that tell Bazel how to |
||||
execute the test that the `cc_file_list_aspect` build config does not. Bazel |
||||
detects this as two conflicting actions generating the same outputs. (For |
||||
`cc_test` rules, the simplest workaround is to provide sources through a |
||||
`filegroup` or similar.) |
||||
|
||||
### File list generation |
||||
|
||||
Lists of input files are generated by Bazel in a format that can be imported to |
||||
other build systems. Currently, Automake- and CMake-style files can be |
||||
generated. |
||||
|
||||
The lists of files are derived from Bazel build targets. The sources can be: |
||||
* `cc_dist_library` rules (as described above) |
||||
* `proto_library` rules |
||||
* individual files |
||||
* `filegroup` rules |
||||
* `pkg_files` or `pkg_filegroup` rules from |
||||
https://github.com/bazelbuild/rules_pkg |
||||
|
||||
For example: |
||||
|
||||
``` |
||||
$ cat gen_file_lists_example/BUILD.bazel |
||||
load("@rules_proto//proto:defs.bzl", "proto_library") |
||||
load("//pkg:build_systems.bzl", "gen_cmake_file_lists") |
||||
|
||||
filegroup( |
||||
name = "doc_files", |
||||
srcs = [ |
||||
"README.md", |
||||
"englilsh_paper.md", |
||||
], |
||||
) |
||||
|
||||
proto_library( |
||||
name = "message", |
||||
srcs = ["message.proto"], |
||||
) |
||||
|
||||
gen_cmake_file_lists( |
||||
name = "source_lists", |
||||
out = "source_lists.cmake", |
||||
src_libs = { |
||||
":doc_files": "docs", |
||||
":message": "buff", |
||||
"//cc_dist_library_example:c": "distlib", |
||||
}, |
||||
) |
||||
|
||||
$ bazel build gen_file_lists_example:source_lists |
||||
$ cat bazel-bin/gen_file_lists_example/source_lists.cmake |
||||
# Auto-generated by //gen_file_lists_example:source_lists |
||||
# |
||||
# This file contains lists of sources based on Bazel rules. It should |
||||
# be included from a hand-written CMake file that defines targets. |
||||
# |
||||
# Changes to this file will be overwritten based on Bazel definitions. |
||||
|
||||
if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10) |
||||
include_guard() |
||||
endif() |
||||
|
||||
# //gen_file_lists_example:doc_files |
||||
set(docs_files |
||||
gen_file_lists_example/README.md |
||||
gen_file_lists_example/englilsh_paper.md |
||||
) |
||||
|
||||
# //gen_file_lists_example:message |
||||
set(buff_proto_srcs |
||||
gen_file_lists_example/message.proto |
||||
) |
||||
|
||||
# //gen_file_lists_example:message |
||||
set(buff_srcs |
||||
gen_file_lists_example/message.proto.pb.cc |
||||
) |
||||
|
||||
# //gen_file_lists_example:message |
||||
set(buff_hdrs |
||||
gen_file_lists_example/message.proto.pb.h |
||||
) |
||||
|
||||
# //gen_file_lists_example:message |
||||
set(buff_files |
||||
gen_file_lists_example/message-descriptor-set.proto.bin |
||||
) |
||||
|
||||
# //cc_dist_library_example:c |
||||
set(distlib_srcs |
||||
cc_dist_library_example/a.cc |
||||
cc_dist_library_example/b.cc |
||||
) |
||||
|
||||
# //cc_dist_library_example:c |
||||
set(distlib_hdrs |
||||
|
||||
) |
||||
``` |
||||
|
||||
A hand-written CMake build rule could then use the generated file to define |
||||
libraries, such as: |
||||
|
||||
``` |
||||
include(source_lists.cmake) |
||||
add_library(distlib ${distlib_srcs} ${buff_srcs}) |
||||
``` |
||||
|
||||
In addition to `gen_cmake_file_lists`, there is also a `gen_automake_file_lists` |
||||
rule. These rules actually share most of the same implementation, but define |
||||
different file headers and different Starlark "fragment generator" functions |
||||
which format the generated list variables. |
||||
|
||||
### Protobuf usage |
||||
|
||||
The main C++ runtimes (lite and full) and the Protobuf compiler use their |
||||
corresponding `cc_dist_library` rules to generate file lists. For |
||||
`proto_library` targets, the file list generation can extract the source files |
||||
directly. For other targets, notably `cc_test` targets, the file list generators |
||||
use `filegroup` rules. |
||||
|
||||
In general, adding new targets to a non-Bazel build system in Protobuf (or |
||||
adding a new build system altogether) requires some one-time setup: |
||||
|
||||
1. The overall structure of the new build system has to be defined. It should |
||||
import lists of files and refer to them by variable, instead of listing |
||||
files directly. |
||||
2. (Only if the build system is new) A new rule type has to be added to |
||||
`//pkg:build_systems.bzl`. Most of the implementation is shared, but a |
||||
"fragment generator" is need to declare a file list variable, and the rule |
||||
type itself has to be defined and call the shared implementation. |
||||
|
||||
When files are added or deleted, or when the Protobuf Bazel structure is |
||||
changed, these changes may need to be reflected in the file list logic. These |
||||
are some example scenarios: |
||||
|
||||
* Files are added to (or removed from) the `srcs` of an existing `cc_library`: |
||||
no changes needed. If the `cc_library` is already part of a |
||||
`cc_dist_library`, then regenerating the source lists will reflect the |
||||
change. |
||||
* A `cc_library` is added: the new target may need to be added to the Protobuf |
||||
`cc_dist_library` targets, as appropriate. |
||||
* A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted |
||||
target, then a build-time error will result. The library needs to be removed |
||||
from the `cc_dist_library`. |
||||
* A `cc_test` is added or deleted: test sources are handled by `filegroup` |
||||
rules defined in the same package as the `cc_test` rule. The `filegroup`s |
||||
are usually given a name like `"test_srcs"`, and often use `glob()` to find |
||||
sources. This means that adding or removing a test may not require any extra |
||||
work, but this can be verified within the same package as the test rule. |
||||
* Test-only proto files are added: the `proto_library` might need to be added |
||||
to the file list map in `//pkg:BUILD.bazel`, and then the file added to |
||||
various build systems. However, most test-only protos are already exposed |
||||
through libraries like `//src/google/protobuf:test_protos`. |
||||
|
||||
If there are changes, then the regenerated file lists need to be copied back |
||||
into the repo. That way, the corresponding build systems can be used with a git |
||||
checkout, without needing to run Bazel first. |
||||
|
||||
### (Aside) Distribution archives |
||||
|
||||
A very similar set of rules is defined in `//pkg` to build source distribution |
||||
archives for releases. In addition to the full sources, Protobuf releases also |
||||
include source archives sliced by language, so that, for example, a Ruby-based |
||||
project can get just the sources needed to build the Ruby runtime. (The |
||||
per-language slices also include sources needed to build the protobuf compiler, |
||||
so they all effectively include the C++ runtime.) |
||||
|
||||
These archives are defined using rules from the |
||||
[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are |
||||
similar to `cc_dist_library` and the file list generation rules, the goals are |
||||
different: the build system file lists described above only apply to C++, and |
||||
are organized according to what should or should not be included in different |
||||
parts of the build (e.g., no tests are included in the main library). On the |
||||
other hand, the distribution archives deal with languages other than C++, and |
||||
contain all the files that need to be distributed as part of a release (even for |
||||
C++, this is more than just the C++ sources). |
||||
|
||||
While it might be possible to use information from the `CcFileList` and |
||||
`ProtoFileList` providers to define the distribution files, additional files |
||||
(such as the various `BUILD.bazel` files) are also needed in the distribution |
||||
archive. The lists of distribution files can usually be generated by `glob()`, |
||||
anyhow, so sharing logic with the file list aspects may not be beneficial. |
||||
|
||||
Currently, all of the file lists are checked in. However, it would be possible |
||||
to build the file lists on-the-fly and include them in the distribution |
||||
archives, rather than checking them in. |
Loading…
Reference in new issue