Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
333 lines
17 KiB
333 lines
17 KiB
1 year ago
|
# What are Protobuf Editions?
|
||
|
|
||
|
**Authors**: [@mcy](https://github.com/mcy), [@fowles](https://github.com/fowles)
|
||
|
|
||
|
## Summary
|
||
|
|
||
|
This document is an introduction to the Protobuf Editions project, an ambitious
|
||
|
re-imagining of how we migrate Protobuf users into the future.
|
||
|
|
||
|
## Goal
|
||
|
|
||
|
Enable incremental evolution of Protobuf across the entire ecosystem **without**
|
||
|
introducing permanent forks in the Protobuf language.
|
||
|
|
||
|
## TL;DR
|
||
|
|
||
|
1. We are replacing
|
||
|
[`syntax`](https://protobuf.dev/reference/protobuf/proto3-spec/#syntax) `=
|
||
|
...` with `edition = ...`.
|
||
|
* We plan to produce a new "edition" on a roughly yearly basis.
|
||
|
* We plan to regularly deprecate and remove old editions after a wide
|
||
|
horizon.
|
||
|
* This gradual churn is enabled by the
|
||
|
[Protobuf Breaking Changes policy](https://protobuf.dev/news/2022-07-06/#library-breaking-change-policy).
|
||
|
2. "Features" are a special kind of file/message/field/enum/etc option.
|
||
|
* Features control the individual codegen and runtime behavior of fields,
|
||
|
messages, enums, etc.
|
||
|
* Features cannot introduce changes that would directly break existing
|
||
|
binaries.
|
||
|
* We expect heavy churn of features in `.proto` files, so their design is
|
||
|
optimized to minimize diffs to `.proto` files while permitting
|
||
|
fine-grained control.
|
||
|
* Features are **usually** attached to the field/message/enum they apply
|
||
|
to.
|
||
|
* Features can be specified at a higher-level entity, such as a file,
|
||
|
to apply to all definitions inside of that entity. This is called
|
||
|
**feature inheritance**.
|
||
|
* Inheritance is intended to allow us to factor frequently-occurring
|
||
|
feature declarations, minimizing clutter during migrations.
|
||
|
3. Editions change only the defaults of features and do not otherwise introduce
|
||
|
new behavior.
|
||
|
* New behavior is fundamentally controlled by features (explicitly set or
|
||
|
implicit from an edition).
|
||
|
* Editions allow us to ratchet the ecosystem forward.
|
||
|
* Editions can be incremented on a per `.proto` file basis; projects
|
||
|
can upgrade incrementally.
|
||
|
4. Messages with any permutation of features are always interoperable (they can
|
||
|
import each other freely and use messages from each other).
|
||
|
* Editions do not split the ecosystem, and migration is largely automated.
|
||
|
* Directly inspired by
|
||
|
[Rust editions](https://doc.rust-lang.org/edition-guide/editions/index.html).
|
||
|
* Carbon has a similar philosophy
|
||
|
5. The `proto2`/`proto3` distinction is going away.
|
||
|
* Editions will support everything from both and allow mixed semantics
|
||
|
even within the same message or field.
|
||
|
* Undesirable features will be LSC'd away, using the same template as any
|
||
|
other feature/edition migration.
|
||
|
|
||
|
## Motivation
|
||
|
|
||
|
Arguably the biggest hard-earned lesson among Software Foundations is that
|
||
|
successful migrations are incremental. Most of our experience with these has
|
||
|
been for internal migrations. Externally, progress has often ossified because of
|
||
|
a lack of established evolution mechanisms. More recently large projects have
|
||
|
started planning incremental evolution into their structure. For example, Carbon
|
||
|
is heavily focused on evolution as a core precept, and Rust has built language
|
||
|
evolution via editions into its core design..
|
||
|
|
||
|
Protobuf is one of Google's oldest and most successful toolchain projects.
|
||
|
However, it was designed before we learned and internalized this lesson, making
|
||
|
modernization difficult and haphazard. We still have `required` and `group`,
|
||
|
`packed` is not everywhere, and string accessors in C++ still return `const
|
||
|
std::string&`. The last radical change to Protobuf (`syntax = "proto3";`) split
|
||
|
the ecosystem.
|
||
|
|
||
|
*Editions* and *features* are new language features that will allow us to
|
||
|
incrementally evolve Protobuf into the future. This will be done by introducing
|
||
|
a new `syntax`, hopefully the last syntax addition we will ever need.
|
||
|
|
||
|
This high-level document is intended as an introduction to Protobuf Editions for
|
||
|
engineers not familiar with the background and the set of tradeoffs that lead us
|
||
|
here. Low-level technical details are skipped in preference to describing the
|
||
|
kernel of our proposed design. This document reflects the approximate consensus
|
||
|
of protobuf-team members who have been developing Protobuf Editions, but please
|
||
|
beware: many open questions remain.
|
||
|
|
||
|
## What is a feature?
|
||
|
|
||
|
A *feature*, in the narrow context of Protobuf Editions, is an `option` on any
|
||
|
syntax entity of a `.proto` file that has the following properties:
|
||
|
|
||
|
* It is a field or extension of a top-level option named `features`, which is
|
||
|
present on every syntax entity (file, message, enum, field, etc). It can be
|
||
|
of any type, but `bool` and `enum` are the most common.
|
||
|
* If a syntax entity's lexical parent has a particular value for a feature,
|
||
|
then the child has the same value, unless the feature has a new value
|
||
|
specified on the child, explicitly. This is called **feature inheritance**,
|
||
|
and applies recursively. Features can specify a new value at any of the
|
||
|
points where a feature can be added.
|
||
|
* It explicitly specifies what syntax entities it can be set on, similar to
|
||
|
Java annotations (although this does not preclude inheritance to or through
|
||
|
an entity that it *cannot* be set on).
|
||
|
|
||
|
Features allow us to control the behavior of `protoc`, its backends, and the
|
||
|
Protobuf runtimes at arbitrary granularity. This is critical for large-scale
|
||
|
changes: if a message has few usages, features can be changed at a bigger scope,
|
||
|
minimizing diff churn, but if it has heavy usage and the CL to migrate a single
|
||
|
field is large, cleanups can happen at the field level, as necessary.
|
||
|
|
||
|
Features won't change a message’s serialization formats (binary, text, or json)
|
||
|
in incompatible ways except for extreme circumstances that will always be
|
||
|
managed directly by protobuf-team. It is critical for migrations that any
|
||
|
behavioral change coming from a feature is the result of a textual change to a
|
||
|
`.proto` file (either an edition bump or a feature change).
|
||
|
|
||
|
`ctype` is an existing field option that looks exactly like a feature: it
|
||
|
controls the behavior of the codegen backend, although it does not have the nice
|
||
|
ratcheting properties of editions.
|
||
|
|
||
|
Because features can be extensions, language backends can specify
|
||
|
**language-scoped** features. For example, `[ctype = CORD]` could instead be
|
||
|
phrased as `[features.(pb.cpp).string_type = CORD]`. Codegen backends own the
|
||
|
definitions of their features.
|
||
|
|
||
|
## What is an Edition?
|
||
|
|
||
|
An *edition* is a collection of defaults for features understood by `protoc` and
|
||
|
its backends. Editions are year-numbered, although we have defined a breakout in
|
||
|
case we need multiple editions in a particular year.
|
||
|
|
||
|
Instead of writing `syntax = "...";`, a Protobuf Editions-enabled `.proto` file
|
||
|
begins with `edition = "2022";` or similar. `edition` implies `syntax =
|
||
|
"editions";`, and the `syntax` keyword itself becomes deprecated. This is to
|
||
|
ensure that old tools not owned by protobuf-team, which only work for old
|
||
|
Protobuf syntaxes, crash or fail quickly and noticeably, instead of wandering
|
||
|
into a descriptor that they cannot understand (we will attempt to migrate what
|
||
|
we can, of course).
|
||
|
|
||
|
`protoc` specifies which editions it understands, and will reject `.proto` files
|
||
|
"from the future", since it cannot meaningfully parse them. `protoc` backends,
|
||
|
which can specify their own set of language-scoped features, must advertise the
|
||
|
defaults for a particular edition that they understand (and reject editions that
|
||
|
they don't). Runtimes must be able to handle descriptors "from the future"; this
|
||
|
only means that upon encountering a descriptor with an edition or feature it
|
||
|
does not understand, there must be a reasonable fallback for the runtime's
|
||
|
behavior.
|
||
|
|
||
|
### What is an Edition used for?
|
||
|
|
||
|
Editions provide the fundamental increments for the lifecycle of a feature. At
|
||
|
this point it is important to reiterate that most features will be specific to
|
||
|
particular code generators. What follows is an example life cycle for a
|
||
|
theoretical feature–`features.(pb.cpp).opaque_repeated_fields`.
|
||
|
|
||
|
1. Edition “2025” creates `features.(pb.cpp).opaque_repeated_fields` with a
|
||
|
default value of `false`. This value is equivalent to the behavior from
|
||
|
editions less than “2025”.
|
||
|
|
||
|
a. The migration to edition “2025” across google will move very fast as it
|
||
|
is a no-op.
|
||
|
|
||
|
2. Migration begins for `features.(pb.cpp).opaque_repeated_fields` (each change
|
||
|
in this migration will add `features.(pb.cpp).opaque_repeated_fields = true`
|
||
|
and be paired with code changes required to C++ code). It is not anticipated
|
||
|
that protos shared between repos will undergo field by field migrations like
|
||
|
this as that would cause a large stream of breaking changes, see
|
||
|
[Protobuf Editions for schema producers](protobuf-editions-for-schema-producers.md)
|
||
|
for more details.
|
||
|
|
||
|
3. Edition “2027” switches the default of
|
||
|
`features.(pb.cpp).opaque_repeated_fields` to `true`.
|
||
|
|
||
|
a. The migration to “2027” will remove explicit uses of
|
||
|
`features.(pb.cpp).opaque_repeated_fields = true` and add explicit uses of
|
||
|
`features.(pb.cpp).opaque_repeated_fields = false` where they were implicit
|
||
|
before. As above, this migration will be a no-op, so it will move very fast.
|
||
|
|
||
|
b. Externally, we will release tools and migration guides for OSS customers.
|
||
|
The tools will not be fully turnkey, but should provide a strong starting
|
||
|
point for user migrations.
|
||
|
|
||
|
4. Migration continues for `features.(pb.cpp).opaque_repeated_fields` (each
|
||
|
change in this migration will remove
|
||
|
`features.(pb.cpp).opaque_repeated_fields = false` and be paired with code
|
||
|
changes required to C++ code).
|
||
|
|
||
|
5. At some point, usage will be officially roped off internally, and
|
||
|
externally.
|
||
|
|
||
|
a. Internally, `features.(pb.cpp).opaque_repeated_fields` usage will be
|
||
|
blocked with allowlists while we remove the hardest to migrate case.
|
||
|
|
||
|
b. Externally, `features.(pb.cpp).opaque_repeated_fields` will be marked
|
||
|
deprecated in a public edition and removed in a later one. When a feature is
|
||
|
removed, the code generators for that behavior and the runtime libraries
|
||
|
that support it may also be removed. In this hypothetical, that might be
|
||
|
deprecated in “2029” and removed in “2031”. Any release that removes support
|
||
|
for a feature would be a major version bump.
|
||
|
|
||
|
The key point to note here is that any `.proto` file that does not use
|
||
|
deprecated features has a no-op upgrade from one edition to the next and we will
|
||
|
provide tools to effect that upgrade. Internal users will be migrated centrally
|
||
|
before a feature is deprecated. External users will have the full window of the
|
||
|
Google migration as well as the deprecation window to upgrade their own code.
|
||
|
|
||
|
It is also important to note that external users will not receive compiler
|
||
|
warnings until the feature is actually deprecated, so we provide a period of
|
||
|
deprecation to ensure that they have time to update their code before forcing
|
||
|
them to upgrade for an edition update.
|
||
|
|
||
|
Separately from feature evolution, `protoc` itself may remove support for old
|
||
|
editions entirely after a suitably long window (like 10 years).
|
||
|
|
||
|
## Edition Zero
|
||
|
|
||
|
The first edition of Protobuf Editions, the so-called "edition zero", will
|
||
|
effectively be a "`proto4`" that introduces the new syntax, and merges the
|
||
|
semantics of `proto2` and `proto3`. In editions mode, everything that was
|
||
|
possible in `proto2` and `proto3` will be possible, and the handful of
|
||
|
irreconcilable differences will be expressed as features.
|
||
|
|
||
|
For example, whether values not specified in an `enum` go into unknown fields vs
|
||
|
producing an enum value outside of the bounds of the specified values in the
|
||
|
`.proto` file (i.e., so-called closed and open enums) will be controlled by
|
||
|
`feature.enum = OPEN` or `feature.enum = CLOSED`.
|
||
|
|
||
|
Edition Zero should be viewed as the "completion" of the union of `proto2` and
|
||
|
`proto3`: it contains both syntaxes as subsets (although with different
|
||
|
spellings to disambiguate things) as well as new behavior that was previously
|
||
|
inexpressible but which is an obvious consequence of allowing everything from
|
||
|
both. For example, `proto3`-style non-optional singular fields could allow
|
||
|
non-zero defaults.
|
||
|
|
||
|
Edition Zero is designed in such a way that we can mechanically migrate an
|
||
|
arbitrary `.proto` file from either `proto2` or `proto3` with no behavioral
|
||
|
changes, by replacing `syntax` with `edition` and adding features in the
|
||
|
appropriate locations.
|
||
|
|
||
|
This will form the foundation of Protobuf Editions and the torrent of parallel
|
||
|
migrations that will follow.
|
||
|
|
||
|
## FAQ
|
||
|
|
||
|
### I only interact with protos by moving them around and editing schemata. How does this affect me?
|
||
|
|
||
|
This will manifest as a handful of new `option`s appearing at the top of your
|
||
|
files. Going forward, expect new `options` to appear and disappear from your
|
||
|
`.proto` files as LSCs march across the codebase. We intend to minimize
|
||
|
disruption, and you should be able to safely ignore them.
|
||
|
|
||
|
In general, you should not need to add `option`s yourself unless we say so in
|
||
|
documentation. We will try to make sure tooling recommends the latest edition
|
||
|
when creating new files.
|
||
|
|
||
|
### Are you taking away <thing>?
|
||
|
|
||
|
Everything expressible today will remain so in Edition Zero. Some syntax will
|
||
|
change: we will have only one way of spelling a singular field (with `optional`
|
||
|
vs. the `proto3` behavior vs. `required` controlled by a feature), `group`s will
|
||
|
turn into sub message fields with a special encoding.
|
||
|
|
||
|
### I think <thing> from proto{2,3} is bad. Why are you letting people use it in my files?
|
||
|
|
||
|
Long-term bifurcation of the language has resulted in significant damage to the.
|
||
|
ecosystem and engineers' mental model of Protobuf. There are features we think
|
||
|
are questionable, too, and we want to remove them. But we need to break some
|
||
|
eggs to make an omelet.
|
||
|
|
||
|
As stewards of the Protobuf language, we believe this is the best way to get rid
|
||
|
of features that were a good idea at the time, but which history has shown to
|
||
|
have had poor outcomes.
|
||
|
|
||
|
### I manipulate protos reflectively, or have some other complicated use-case
|
||
|
|
||
|
We plan to upgrade reflection to be feature-aware in a way that minimizes code
|
||
|
we need to change. We do not expect anyone to implement feature-inheritance
|
||
|
logic themselves; feature inheritance should be fully transparent to users,
|
||
|
behaving as if features had been placed explicitly everywhere. (Owners of code
|
||
|
generators should be the only ones that need to know how to correctly propagate
|
||
|
features.)
|
||
|
|
||
|
We will be partnering with use-cases that are known risks for migration, such as
|
||
|
storage providers, to minimize toil and disruption on all sides.
|
||
|
|
||
|
### I want to use features to fix a defect in Protobuf
|
||
|
|
||
|
Generally, the owner of the relevant component that ingests a particular feature
|
||
|
(`protoc` or the appropriate language backend) will own it. We will try to make
|
||
|
it as straightforward as we can to add a language-scoped feature, but it may
|
||
|
require some degree of coordination with us to get it into an edition.
|
||
|
|
||
|
Even if it's about one of protobuf-team's backends, we'd love to hear what you
|
||
|
think we can fix, within the constraints of editions.
|
||
|
|
||
|
### What's your OSS strategy?
|
||
|
|
||
|
We want to share a variant of this document with the OSS community. We plan to
|
||
|
publish migration guides and, where feasible, any migration tooling, such as the
|
||
|
`proto2`/`proto3` -> `edition` migrator.
|
||
|
|
||
|
As stated above, we want to minimize friction for non-protobuf-team-owned
|
||
|
backends, and this ties into helping third party code generators minimize their
|
||
|
pain.
|
||
|
|
||
|
### I like Protobuf as it is. Can I keep my old files?
|
||
|
|
||
|
Yes, but you get to keep both pieces. Failing to migrate off of old use-cases
|
||
|
and into newer versions that fix known defects is a risk for the entire
|
||
|
ecosystem: C++'s disastrous standardization process is a solemn warning of
|
||
|
failing to do so.
|
||
|
|
||
|
Trying to stay on `proto2` or `proto3` will eventually cease to be supported,
|
||
|
and old editions (e.g. 5 years) will also cease to be supported. Evolution is at
|
||
|
the heart of Protobuf, and we want to make it as easy as possible for users to
|
||
|
keep up with our progress towards a better Protobuf.
|
||
|
|
||
|
### What do you hope to use editions to change in the short/mid term?
|
||
|
|
||
|
An incomplete list of *ideas*, which should be taken as non-committal.
|
||
|
|
||
|
* Eliminate `required` completely by making a particular field be optional but
|
||
|
serialized unconditionally.
|
||
|
* Make all uses of `string` require UTF-8 checking, and all uses that don't
|
||
|
want/need it `bytes`, fulfilling the original `proto3` vision.
|
||
|
* Make every `string` and `bytes` accessor in C++ return `absl::string_view`,
|
||
|
unlocking performance optimizations.
|
||
|
* Make all scalar `repeated` fields `packed`, improving throughput.
|
||
|
* Make `enum` enumerators in C++ use `kName` instead of `NAME`.
|
||
|
* Make `enum` declarations in C++ into scoped `enum class`.
|
||
|
* Make `CTYPE` into a language-scoped feature.
|
||
|
* Replace per-language, file-level options with language-scoped features.
|
||
|
* Make reflection opt-in for some languages (C++).
|