Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
635 lines
30 KiB
635 lines
30 KiB
# Editions: Life of a FeatureSet |
|
|
|
**Author:** [@mkruskal-google](https://github.com/mkruskal-google) |
|
|
|
**Approved:** 2023-08-17 |
|
|
|
## Background |
|
|
|
Outside of some minor spelling tweaks, our current implementation of features |
|
has very closely followed the original design laid out in |
|
[Protobuf Editions Design: Features](protobuf-editions-design-features.md). This |
|
approach led to the creation of four different feature sets for each descriptor |
|
though, and it's left under-specified who is responsible for generating these |
|
(protoc, plugins, runtimes), who has access to them, and where they need to be |
|
propagated to. |
|
|
|
*Exposing Editions Feature Sets* (not available externally) was a first attempt |
|
to try to define some of these concepts. It locks down feature visibility to |
|
protoc, generators, and runtimes. Users will only be exposed to them indirectly, |
|
via codegen changes or runtime helper functions, in order to avoid Hyrum's law |
|
cementing every decision we make about them. We (incorrectly) assumed that the |
|
protoc frontend would be able to calculate all the feature sets and then |
|
propagate all four sets to the generators, who would then forward the fully |
|
resolved runtime features to the runtime. This had the added benefit that we |
|
could treat our C++ feature resolution logic as a source-of-truth and didn't |
|
have to reimplement it identically in every language we support. |
|
|
|
*Editions: Runtime Feature Set Defaults* (not available externally) was a |
|
follow-up attempt to specifically handle the default feature sets of an edition. |
|
We had realized that we would need proto2/proto3 default features in each |
|
language to safely roll out editions, and that languages supporting descriptor |
|
pools would have cases that bypass protoc entirely. The solution we arrived at |
|
was that we should continue using the protoc frontend as the source-of-truth, |
|
and propagate these defaults down to the necessary runtimes. This would fix the |
|
proto2/proto3 issue, and at least provide some utilities to make the situation |
|
easier for descriptor pool users. |
|
|
|
[Protobuf Editions Design: Features](protobuf-editions-design-features.md) |
|
defines the feature resolution algorithm, which can be summarized by the |
|
following diagram: |
|
|
|
![Feature resolution diagram](./images/editions-life-of-a-featureset-image-01.png) |
|
|
|
Feature resolution for a given descriptor starts by using the proto file's |
|
edition and the feature schemas to generate the default feature set. It then |
|
merges all of the parent features from top to bottom, merging the descriptor's |
|
features last. |
|
|
|
## Glossary |
|
|
|
We will be discussing features **a lot** in this document, but the meaning |
|
behind the word can vary in some subtle ways depending on context. Whenever it's |
|
ambiguous, we will stick to qualifying these according to the following |
|
definitions: |
|
|
|
* **Global features** - The features contained directly in `FeatureSet` as |
|
fields. These apply to the protobuf language itself, rather than any |
|
particular runtime or generator. |
|
|
|
* **Generator features** - Extensions of `FeatureSet` owned by a specific |
|
runtime or generator. |
|
|
|
* **Feature resolution** - The process of applying the algorithm laid out in |
|
[Protobuf Editions Design: Features](protobuf-editions-design-features.md). |
|
This means that edition defaults, parent features, and overrides have all |
|
been merged together. After resolution, every feature should have an |
|
explicit value. |
|
|
|
* **Unresolved features** - The features a user has explicitly set on |
|
their descriptors in the `.proto` file. These have not gone through |
|
feature resolution and are a minimal representation that require more |
|
knowledge to be useful. |
|
|
|
* **Resolved features** - Features that have gone through feature |
|
resolution, with defaults and inheritance applied. These are the only |
|
feature sets that should be used to make decisions. |
|
|
|
* **Option Retention** - We support a retention specification on all options |
|
(see |
|
[here](https://protobuf.dev/programming-guides/proto3#option-retention)), |
|
including features |
|
|
|
* **Source features** - The features available to protoc and generators, |
|
before option retention has been applied. These can be either resolved |
|
or unresolved. |
|
|
|
* **Runtime features** - The features available to runtimes after option |
|
retention has been applied. These can be either resolved or unresolved. |
|
|
|
## Problem Description |
|
|
|
The flaw that all of these design documents suffer from is that protoc **can't** |
|
be the universal source-of-truth for feature resolution under the original |
|
design. For global features, there's of course no issue (protoc has a |
|
bootstrapping setup for `descriptor.proto`` and always knows the global feature |
|
set). For generator features though, we depend on [imports to make them |
|
discoverable](protobuf-editions-design-features.md#specification-of-an-edition). |
|
|
|
If a user is actually overriding one of these features, there will necessarily |
|
be an import and therefore protoc will be able to discover generator features |
|
and handle resolution. However, if the user is ok with the edition defaults |
|
there's no need for an import. Without the import, protoc has **no way of |
|
knowing** that those generator features exist in general. We could hardcode the |
|
ones we own, but that just pushes the problem off to third-party plugins. We |
|
could also force proto owners to include imports for *every* (transitive) |
|
language they generate code to, even if they're unused, but that would be very |
|
disruptive and isn't practical or idiomatic. |
|
|
|
Pushing the source-of-truth to the generators makes things a little better, |
|
since they each know exactly what feature file needs to be included. There's no |
|
longer any knowledge gap, and we don't need to rely on imports to discover the |
|
feature extension. Additionally, many of our generators are written in C++ (even |
|
non-built-in plugins), so we could at least reuse our existing feature |
|
resolution utility for all of those and limit the amount of duplication |
|
necessary. However, there's still a code-size issue with this approach. As |
|
described in the previous documents, we would need to send four feature sets for |
|
**every** descriptor to the runtime (i.e. in the generator request and embedded |
|
as a serialized string). We wouldn't be able to use inheritance or references to |
|
minimize the cost, and every generator that embeds a `FileDescriptorProto` into |
|
its gencode would see a massive code-size increase. |
|
|
|
There's also still the issue of descriptor pools that need to be able to build |
|
descriptors at runtime. These are typically power users (and our own unit-tests) |
|
doing very atypical things and bypassing protoc entirely. In previous documents |
|
we've attempted to push some of the cost onto them by explicitly not giving them |
|
feature resolution. They would have to specify every feature on every |
|
descriptor, and would not be able to use edition defaults or inheritance. |
|
However, this cost is fairly high and it also makes the `edition` field |
|
meaningless. Any missing feature would be a runtime error, and there would be no |
|
concept of "edition". This creates an inconsistent experience for developers, |
|
where they think in terms of editions in one context and then throw it out in |
|
another. Also, it would mean that we have two distinct ways of specifying a |
|
`FileDescriptorProto``: with unresolved features meant to only go through |
|
protoc, and with fully resolved features meant to always bypass protoc. |
|
Round-tripping descriptors would become difficult or impossible. |
|
|
|
The following image attempts illustrates the issue: |
|
|
|
![Diagram showing two generators and two runtimes](./images/editions-life-of-a-featureset-image-03.png) |
|
|
|
Here, a proto file is used in both A and B runtimes. The schema itself only |
|
overrides features for A though, and doesn't declare an import on B's features. |
|
This means that protoc doesn't know about B's features, and Generator B will |
|
need to resolve them. Additionally, dynamic messages in both A and B runtimes |
|
have issues because they've bypassed protoc and don't have any way to follow the |
|
feature resolution spec. |
|
|
|
### Requirements |
|
|
|
The following minimal feature sets are required by protoc: |
|
|
|
* **Resolved global source features** - to make proto-level decisions |
|
* **Unresolved global source features** - for validation |
|
|
|
For each generator: |
|
|
|
* **Resolved generator source features** - to make language-specific codegen |
|
decisions |
|
* **Unresolved generator source features** - for validation |
|
* **Resolved global source features** - to make more complex decisions |
|
|
|
For each runtime: |
|
|
|
* **All resolved runtime features** - for making runtime decisions |
|
* **All unresolved runtime features** - for round-trip behavior and debugging |
|
|
|
With some additional requirements on an ideal solution: |
|
|
|
* **Minimal code-size costs** - code size bloat can easily block the rollout |
|
of editions, and once those limits are hit we don't have great solutions |
|
|
|
* **Minimal performance costs** - we want a solution that avoids any |
|
unnecessary CPU or RAM regressions |
|
|
|
* **Minimal code duplication** - obviously we want to minimize this, but where |
|
we can't, we need a suitable test strategy to keep the duplication in sync |
|
|
|
* **Runtime support for dynamic messages** - while dynamic messages are a |
|
less-frequently-used feature, they are a critical feature used by a lot of |
|
important systems. Our solution should avoid making them harder to use in |
|
any runtime that supports them. |
|
|
|
## Recommended Solution |
|
|
|
Our long-term recommendation here is to support and use feature resolution in |
|
every stage in the life of a FeatureSet. Every runtime, generator, and protoc |
|
itself will all handle feature resolution independently, only sharing unresolved |
|
features between each other. This will necessarily mean duplication across |
|
nearly every language we support, and the following sections will go into detail |
|
about strategies for managing this. |
|
|
|
The main justification for this duplication is the simple fact that *edition |
|
defaults* will be needed almost everywhere. The generators need defaults for |
|
*their* features to get fully resolved generator features to make decisions on, |
|
and can't get them from protoc in every case. The runtimes need defaults for |
|
both global and generator features in order to honor editions in dynamic |
|
messages and to keep RAM costs down (e.g. the absence of feature overrides |
|
should result in a reference to some shared default object). Since the |
|
calculation of edition defaults is by far the most complicated piece of feature |
|
resolution, with the remainder just being proto merges, it makes everything |
|
simpler to understand if we just duplicate the entire algorithm. |
|
|
|
#### Pros |
|
|
|
* Resolved feature sets will never be publicly exposed |
|
|
|
* Our APIs will be significantly simpler, cutting the number of different |
|
types of feature sets by a factor of 2 |
|
|
|
* There will be no ambiguity about what a `FeatureSet` object *means*. It |
|
will always either be unresolved (outside of protobuf code) or fully |
|
resolved on all accessible features (inside protobuf code). |
|
|
|
* RAM and code-size costs will be minimal, since we'll only be storing and |
|
propagating the minimal amount of information (unresolved features) |
|
|
|
* Combats Hyrum's law by allowing us to provide wrappers around resolved |
|
features everywhere, instead of letting people depend on them directly |
|
|
|
* **Minimal** duplication on top of what's already necessary (edition |
|
defaults). |
|
|
|
* Dynamic messages will be treated on equal footing to proto files |
|
|
|
* The necessary feature dependencies will always be available in the |
|
appropriate context |
|
|
|
* We can simplify the current implementation since protoc won't need to handle |
|
resolution of imported features. |
|
|
|
#### Cons |
|
|
|
* Requires duplication of feature resolution in every runtime and every unique |
|
generator language |
|
|
|
* This means building out additional infrastructure to enforce |
|
cross-language conformance |
|
|
|
### Runtimes Without Reflection |
|
|
|
There are various runtimes that do not support reflection or dynamic messages at |
|
all (e.g. Java lite, ObjC). They typically embed the "feature-like" information |
|
they need directly into custom objects in the gencode. In these cases, the |
|
problem becomes a lot simpler because they *don't need* the full FeatureSet |
|
objects. We **don't** need to duplicate feature resolution in the runtime, and |
|
the generator can just directly embed the fully resolved features values needed |
|
by the runtime (of course, the generator might still need duplicate logic to get |
|
those). |
|
|
|
### Staged Rollout for Dynamic Messages |
|
|
|
Long-term, we want to be able to handle feature resolution at run-time for any |
|
runtime that supports reflection (and therefore needs FeatureSet objects) to |
|
reduce code-size/RAM costs and support dynamic messages. However, in any |
|
language where these costs are less critical, a staged rollout could be |
|
appropriate. Here, the generator would embed the serialized resolved source |
|
features into the gencode along with the rest of the options. We would use the |
|
`raw_features` field (which should eventually be deleted) to also include the |
|
unresolved features for reflection. |
|
|
|
This would allow us to implement and test editions, and unblock the migration of |
|
all non-dynamic cases. A follow-up optimization at a later stage could push this |
|
down the runtime, and only embed unresolved features in the gencode. |
|
|
|
Under this scenario, dynamic messages could still allow editions, as long as |
|
fully-resolved features were provided on every descriptor. When we do implement |
|
feature resolution, it will just be a matter of deleting redundant/unnecessary |
|
features, but there should always be a valid transformation from fully-resolved |
|
features to unresolved ones. |
|
|
|
### C++ Generators |
|
|
|
Generators written in C++ are in a better position since they don't require any |
|
code duplication. They could be given visibility to our existing feature |
|
resolution utility to resolve the features themselves. However, a better |
|
alternative is to make improvements to this utility so that some helpers like |
|
the ones we proposed in *Exposing Editions Feature Sets* can be used to access |
|
the resolved features that *already exist*. |
|
|
|
Protoc works by first parsing the input protofiles and building them into a |
|
descriptor pool. This is the frontend pass, where only the global features are |
|
needed. For built-in languages, the resulting descriptors are passed directly to |
|
the generator for codegen. For plugins, they're serialized into descriptor |
|
protos, rebuilt in a new descriptor pool (in the generator process), and then |
|
sent to the generator code for codegen. In both of these cases, a |
|
`DescriptorPool` build of the protos is done from a binary that *necessarily* |
|
links in the relevant generator features. |
|
|
|
Today, we discover features in the pool which are imported by the protos being |
|
built. This has the hole we mentioned above where non-imported features can't be |
|
discovered. Instead, we will pivot to a more explicit strategy for discovering |
|
features. By default, `DescriptorPool` will only resolve the global features and |
|
the C++ features (since this is the C++ runtime). A new method will be added to |
|
`DescriptorPool` that allows new feature sets to replace the C++ features for |
|
feature resolution. Generators will register their features via a virtual method |
|
in `CodeGenerator` and the generator's pool build will take those into account |
|
during feature resolution. |
|
|
|
There are a few ways to actually define this registration, which we'll leave as |
|
implementation details. Some examples that we're considering include: |
|
|
|
* Have the generator provide its own `DescriptorPool` containing the relevant |
|
feature sets |
|
* Have the generator provide a mapping of edition -> default `FeatureSet` |
|
objects |
|
|
|
Expanding on previous designs, we will provide the following API to C++ |
|
generators via the `CodeGenerator` class: |
|
|
|
They will have access to all the fully-resolved feature sets of any descriptor |
|
for making codegen decisions, and they will have access to their own unresolved |
|
generator features for validation. The `FileDescriptor::CopyTo` method will |
|
continue to output unresolved runtime features, which will become unresolved |
|
source features after option retention stripping (which generators should |
|
already be doing), for embedding in the gencode for runtime use. |
|
|
|
#### Example |
|
|
|
As an example, let's look at some hypothetical language `lang` and how it would |
|
introduce its own features. First, if it needs features at runtime it would |
|
create a `lang_features.proto` file in its runtime directory and bootstrap the |
|
gencode the same as it does for `descriptor.proto`. It would then *also* |
|
bootstrap C++ gencode using a special C++-only build of protoc. This can be |
|
illustrated in the following diagram: |
|
|
|
![Diagram showing how a language introduces its own features](./images/editions-life-of-a-featureset-image-04.png) |
|
|
|
This illustrates the bootstrapping setup for a built-in C++ generator. If |
|
generator features weren't needed in the runtime, that red box would disappear. |
|
If this were a separate plugin, the "plugin" box would simply be moved out of |
|
`protoc` and `protoc` could also serve as `protoc_cpp`. |
|
|
|
If `lang` didn't need runtime features, we would simply put the features proto |
|
in the `lang` generator and only generate C++ code (using the same bootstrapping |
|
technique as above). |
|
|
|
After the generator registers `lang_features.proto` with the DescriptorPool, the |
|
`FeatureSet` objects returned by `GetFeatures` will always have fully resolved |
|
`lang` features. |
|
|
|
### Non-C++ Generators |
|
|
|
As we've shown above, non-C++ generators are already in a situation where they'd |
|
need to duplicate *some* of the feature resolution logic. With this solution, |
|
they'd need to duplicate much more of it. The `GeneratorRequest` from protoc |
|
will provide the full set of *unresolved* features, which they will need to |
|
resolve and apply retention stripping to. |
|
|
|
**Note:** If we're able to implement bidirectional plugin communication, the |
|
[Bidirectional Plugins](#bidirectional-plugins) alternative may be a simpler |
|
solution for non-C++ generators that *don't* need features at runtime. Ones that |
|
need it at runtime will need to reimplement feature resolution anyway, so it may |
|
be less useful. |
|
|
|
One of the trickier pieces of the resolution logic is the calculation of edition |
|
defaults, which requires a lot of reflection. One of the ideas mentioned above |
|
in [C++ Generators](#c++-generators) could actually be repurposed to avoid |
|
duplication of this in non-C++ generators as well. The basic idea is that we |
|
start by defining a proto: |
|
|
|
``` |
|
message EditionFeatureDefaults { |
|
message FeatureDefaults { |
|
string edition = 1; |
|
FeatureSet defaults = 2; |
|
} |
|
repeated FeatureDefaults defaults = 1; |
|
string minimum_edition = 2; |
|
string maximum_edition = 3; |
|
} |
|
``` |
|
|
|
This can be filled from any feature set extension to provide a much more usable |
|
specification of defaults. We can package a genrule that converts from feature |
|
protos to a serialized `EditionFeatureDefaults` string, and embed this anywhere |
|
we want. Both C++ and non-C++ generators/runtimes could embed this into their |
|
code. Once this is known, feature resolution becomes a lot simpler. The hardest |
|
part is creating a comparator for edition strings. After that, it's a simple |
|
search for the lower bound in the defaults, followed by some proto merges. |
|
|
|
### Bootstrapping |
|
|
|
One major complication we're likely to hit revolves around our bootstrapping of |
|
`descriptor.proto`. In languages that have dynamic messages, one codegen |
|
strategy is to embed the `FileDescriptorProto` of the file and then parse and |
|
build it at the beginning of runtime. For `descriptor.proto` in particular, |
|
handling options can be very challenging. For example, in Python, we |
|
intentionally strip all options from this file and then assume that the options |
|
descriptors always exist during build (in the presence of serialized options). |
|
Since features *are* options, this poses a challenge that's likely to vary |
|
language by language. |
|
|
|
We will likely need to special-case `descriptor.proto` in a number of ways. |
|
Notably, this file will **never** have any generator feature overrides, since it |
|
can't import those files. In every other case, we can safely assume that |
|
generator features exist in a fully resolved feature set. But for |
|
`descriptor.proto`, at least at the time it's first being built by the runtime, |
|
this extension won't be present. We also can't figure out edition defaults at |
|
that point since we don't have the generator features proto to reflect over. |
|
|
|
One possible solution would be to codegen extra information specifically for |
|
this bootstrapped proto, similar to what we suggested in *Editions: Runtime |
|
Feature Set Defaults* for edition defaults. That would allow the generator to |
|
provide enough information to build `descriptor.proto` during runtime. As long |
|
as these special cases are limited to `descriptor.proto` though, it can be left |
|
to a more isolated language-specific discussion. |
|
|
|
### Conformance Testing |
|
|
|
Code duplication means that we need a test strategy for making sure everyone |
|
stays conformant. We will need to implement a conformance testing framework for |
|
validating that all the different implementations of feature resolution agree. |
|
Our current conformance tests provide a good model for accomplishing this, even |
|
though they don't quite fit the problem (they're designed for |
|
parsing/serialization). There's a runner binary that can be hooked up to another |
|
binary built in any language. It sends a `ConformanceRequest` proto with a |
|
serialized payload and set of instructions, and then receives a |
|
`ConformanceResponse` with the result. In the runner, we just loop over a number |
|
of fixed test suites to validate that the supplied binary is conformant. |
|
|
|
We would want a similar setup here for language-agnostic testing. While we could |
|
write a highly focused framework just for feature resolution, a more general |
|
approach may set us up better in the future (e.g. option retention isn't |
|
duplicated now but could have been implemented that way). This will allow us to |
|
test any kind of transformation to descriptor protos, such as: proto3_optional, |
|
group/DELIMITED, required/LEGACY_REQUIRED. The following request/response protos |
|
describe the API: |
|
|
|
``` |
|
message DescriptorConformanceRequest { |
|
// The file under test, pre-transformation. |
|
FileDescriptorProto file = 1; |
|
|
|
// The pool of dependencies and feature files required for build. |
|
FileDescriptorSet dependencies = 2; |
|
} |
|
|
|
message DescriptorConformanceResponse { |
|
// The transformed file. |
|
FileDescriptorProto file = 1; |
|
|
|
// Any additional features added during build. |
|
FileDescriptorSet added_features = 2; |
|
} |
|
``` |
|
|
|
Each test point would construct a proto file, its dependencies, and any feature |
|
files to include in feature resolution. The conformance binary would use this to |
|
fully decorate the proto file with resolved features, and send the result back |
|
for comparison against our C++ source-of-truth. Any generator features added by |
|
the binary will also need to be sent back to get matching results. |
|
|
|
### Documentation |
|
|
|
Because we're now asking third-party generator owners to handle feature |
|
resolution on their own, we will need to document this. Specifically, we need to |
|
open-source documentation for: |
|
|
|
* The algorithm described in |
|
[Protobuf Editions Design: Features](protobuf-editions-design-features.md) |
|
* The conformance test framework and how to use it (once it's implemented) |
|
|
|
On the other hand, we will have significantly less documentation to write about |
|
which feature sets to use where. Descriptor protos will *always* contain |
|
unresolved features, and C++ generators will have a simple API for getting the |
|
fully-resolved features. |
|
|
|
## Considered Alternatives |
|
|
|
### Use Generated Pool for C++ Generators |
|
|
|
*Note: this was part of the original proposal, but has been refactored (see |
|
cons)* |
|
|
|
Generators written in C++ are in a better position since they don't require any |
|
code duplication. They could be given visibility to our existing feature |
|
resolution utility to resolve the features themselves. However, a better |
|
alternative is to make improvements to this utility so that some helpers like |
|
the ones we proposed in *Exposing Editions Feature Sets* can be used to access |
|
the resolved features that *already exist*. |
|
|
|
Protoc works by first parsing the input protofiles and building them into a |
|
descriptor pool. This is the frontend pass, where only the global features are |
|
needed. For built-in languages, the resulting descriptors are passed directly to |
|
the generator for codegen. For plugins, they're serialized into descriptor |
|
protos, rebuilt in a new descriptor pool (in the generator process), and then |
|
sent to the generator code for codegen. In both of these cases, a |
|
`DescriptorPool` build of the protos is done from a binary that *necessarily* |
|
links in the relevant generator features. |
|
|
|
However, the FeatureSets we supply to generators are transformed to the |
|
generated pool (i.e. `FeatureSet` objects rather than `Message`) where the |
|
generator features will always exist. We've decided that there's no longer any |
|
reason to scrape the imports for features, but we *could* scrape the generated |
|
pool for them. This essentially means that when you call `MergeFeatures` to get |
|
a `FeatureSet`, the returned set is fully resolved *with respect to the current |
|
generated pool*. This is a much clearer contract, and has the benefit that the |
|
features visible to every C++ generator would automatically be populated with |
|
the correct generator features for them to use. |
|
|
|
Expanding on previous designs, we will provide the following API to C++ |
|
generators via the `CodeGenerator` class: |
|
|
|
They will have access to all the fully-resolved feature set of any descriptor |
|
for making codegen decisions, and they will have access to their own unresolved |
|
generator features for validation. The `FileDescriptor::CopyTo` method will |
|
continue to output unresolved runtime features, which will become unresolved |
|
source features after option retention stripping (which generators should |
|
already be doing), for embedding in the gencode for runtime use. |
|
|
|
#### Pros |
|
|
|
* Automatic inclusion of any features used in a binary |
|
* Features will never be partially resolved |
|
|
|
#### Cons |
|
|
|
* Implicit action at a distance could cause unexpected behaviors |
|
* Uses globals, making testing awkward |
|
* Not friendly to `DescriptorPool` cases who wouldn't necessarily want every |
|
linked-in feature to go through feature resolution. |
|
|
|
### Default Placeholders |
|
|
|
Protoc continues to propagate and resolve core features and imported language |
|
level features. For language level features that protoc does not know about |
|
(that is, not imported), a core placeholder feature indicating that the default |
|
for a given edition should be respected can be propagated. |
|
|
|
``` |
|
message FeatureSet { |
|
optional string unknown_feature_edition_default = N; // e.g. 2023 |
|
} |
|
``` |
|
|
|
Instead of duplicating the entire feature resolution algorithm, plugins must |
|
only provide a utility mapping editions to their default FeatureSet using the |
|
generator feature files and optionally caching them. |
|
|
|
For example: |
|
|
|
``` |
|
if features.hasUtf8Validation(): |
|
return features.getUtf8Validation() |
|
else: |
|
default_features = getDefaultFeatures(features.getUnknownFeatureEditionDefault()) |
|
return default_features.getUtf8Validation() |
|
``` |
|
|
|
#### Pros |
|
|
|
* Less duplicate logic for propagating features |
|
|
|
#### Cons |
|
|
|
* Descriptor proto bloat that is technically redundant with |
|
`FileDescriptorProto` edition. |
|
* Confusing that some but not all features are fully resolved |
|
* Duplicated logic to resolve edition default from edition # |
|
* Code-size and memory costs associated with the original approach still exist |
|
* Still doesn't help with the descriptor pool case, which may require |
|
duplicate logic. |
|
|
|
### Bidirectional Plugins |
|
|
|
Since the generators know the features they care about, we could have some kind |
|
of bidirectional communication between protoc and the plugins. The plugin would |
|
start by telling protoc the features it wants added, and then protoc would be |
|
able to fully resolve all feature sets before sending them off. This has the |
|
added benefit that it would allow us to do more interesting enhancements in the |
|
future. For example, the plugin could send its minimum required edition and |
|
other requirements *before* actually starting the build. |
|
|
|
**Note:** Bidirectional plugins could still be implemented for other purposes. |
|
This "alternative" is specifically for *using* that communication to pass |
|
missing feature specs. |
|
|
|
#### Pros |
|
|
|
* Eliminates code duplication problem |
|
* Provides infrastructure to enable future enhancements |
|
|
|
#### Cons |
|
|
|
* Doesn't address the confusing API we have now where it's unclear what kind |
|
of features are contained in the `features` field |
|
* Doesn't address the code-size and memory costs during runtime |
|
* Doesn't address the descriptor pool case |
|
|
|
### Central Feature Registry |
|
|
|
Instead of relying on generators and imports to supply feature specs, we could |
|
pivot to a central registry of all known features. Instead of simply claiming an |
|
extension number, generator owners could be required to submit all the feature |
|
protos to a central repository of feature protos. This would give protoc access |
|
to **all** features. There would be two ways to implement this: |
|
|
|
* If it were built *into* protoc, we could avoid requiring any import |
|
statements. We would probably still want an extension point to avoid adding |
|
a dependency to `descriptor.proto`, but instead of `features.(pb.cpp)` they |
|
would be something more like `features.(pb).cpp`. |
|
|
|
* We could keep the current extension and import scheme. Proto files would |
|
still need to import the features they override, but protoc would depend on |
|
all of them and populate defaults for unspecified ones. |
|
|
|
#### Pros |
|
|
|
* Makes all features easily discoverable wherever they're needed |
|
* Eliminates the code duplication problem |
|
* Gives us an option to remove the import statements, which are likely to |
|
cause future headaches (in the edition zero LSC, in maintenance afterward, |
|
and also for proto files that need to support a lot of third-party |
|
runtimes). |
|
|
|
#### Cons |
|
|
|
* Doesn't address the code-size and memory costs |
|
* Creates version skew problems |
|
* Confusing ownership semantics |
|
|
|
### Do Nothing |
|
|
|
Doing nothing would basically mean abandoning editions. The current design |
|
doesn't (and can't) work for third party generators. They'd be left to duplicate |
|
the logic themselves with no guidance or support from us. We would also see |
|
code-size and RAM bloat (except in C++) that would be very difficult to resolve. |
|
|
|
#### Pros |
|
|
|
* Less work |
|
|
|
#### Cons |
|
|
|
* Worse in every other way
|
|
|