diff --git a/docs/design/editions/README.md b/docs/design/editions/README.md index 751c45390e..045180d067 100644 --- a/docs/design/editions/README.md +++ b/docs/design/editions/README.md @@ -25,4 +25,5 @@ The following topics are in this repository: * [Editions: Life of a Featureset](editions-life-of-a-featureset.md) * [Protobuf Editions for Schema Producers](protobuf-editions-for-schema-producers.md) * [Stricter Schemas with Editions](stricter-schemas-with-editions.md) +* [Edition Zero: Converged Semantics](edition-zero-converged-semantics.md) * [Edition Naming](edition-naming.md) diff --git a/docs/design/editions/edition-zero-converged-semantics.md b/docs/design/editions/edition-zero-converged-semantics.md new file mode 100644 index 0000000000..2b9557a240 --- /dev/null +++ b/docs/design/editions/edition-zero-converged-semantics.md @@ -0,0 +1,248 @@ +# Edition Zero: Converged Semantics + +**Authors:** [@perezd](https://github.com/perezd), +[@haberman](https://github.com/haberman) + +**Approved:** 2021-10-07 + +## Background + +The Protobuf Team has been exploring potential facilities for introducing +breaking API and semantic changes. This document is an attempt to make use of +these facilities to unify proto semantics from this point forward, while giving +customers the ability to more-granularly manage their project's specific needs. + +## Objective + +We want to reduce complications of API semantics that are coarsely managed +through the syntax keyword, and instead default to converged proto2/proto3 when +opting into editions. Where needed, customers will be able to opt out of +specific semantics that are incompatible with their existing usages, at a +fine-grained level, using the new capabilities provided by editions + features. + +### Why Now? + +As we introduce new facilities for managing breaking changes, we have an +additional opportunity to cutover and rectify a long-standing vision of +converging proto 2/3 semantics as a natural extension of this. + +Doing this in lockstep with the introduction of editions provides the protobuf +team with a few valuable outcomes: + +* Editions provides us with more granular specification of intent than the + existing coarse knob of "proto2" or "proto3." By opting into our first + edition, customers are upgrading to what we've referred to in the past as + "converged semantics," and if needed can reversibly downgrade back to proto2 + or proto3 semantics respectively by opt-ing out of the specific features + that are incompatible with their existing needs. + +* The protobuf team can avoid the n^2 complexity of considering how an + edition/feature will interplay with an explicit syntax designation of + "proto2" vs "proto3" for all impacted runtimes. This allows us to transition + our thinking/support model to be explicitly feature-centric. + +* The introduction of editions will almost certainly cause a major version + bump and gives us ample justification to make breaking changes as we + transition to this granular specification. + +## Introduction of the `edition` keyword to proto IDL + +The `edition` keyword is used to define which semantic version a particular file +and all of its contents will adhere to as a baseline. Whenever a proto file +declares an `edition` keyword, it automatically defaults to converged proto2/3 +semantics. + +An edition's value is represented as a string, encoded by convention as a year. + +## Introduction of `features` option to `descriptor.proto` + +This option will be uniformly defined as a repeated set of strings which can be +used to encode the ability to opt-out of a specific feature (eg: +`"-string_view"`), or to potentially opt-in to a future/experimental feature +(eg: `"string_view"`). The `features` option will be added to `descriptor.proto` +for the following descriptor options: + +* File +* Message +* Field +* Enum +* Enum Value +* Oneof +* Service +* Method +* Stream (internal repositories only) + +Features are only respected when used in conjunction with the `edition` keyword. +They are not validated for correctness to ensure they are forward/backward +compatible with releases. + +Features may be declared at any descriptor level, however, a feature definition +may influence descendant types at the discretion of the protobuf team. (e.g., a +file-level feature opt-out could impact all fields within the file, if it was +desired). + +## A taxonomy of features + +Features can be broken down into two main categories: language-specific and +semantic. + +### Language-specific features + +Language-specific features pertain to the generated API for a given language. +Referring to the protobuf breaking changes backlog we can see some examples: + +* (C++) Changing string fields to return `string_view`. +* (Java) Removing the confusing `Enum#valueOf(int)` API. +* (Java) Rename oneof enums to do appropriate camel casing. + +Language-specific features have no meaning for any other language: they can be +ignored entirely. They are, in essence, a private (tunneled) interface between +protobuf IDL and the respective code generator. Each language's code generator +can independently decide what the "base" set of features is for any given +edition. Each language defines the migration path between editions +independently. + +### Semantic Features + +Semantic features define behavior changes that apply to the protobuf data model, +independent of language. These can also have API implications, but their meaning +goes deeper than just a surface-level API. Some examples of semantic features: + +* Open enums (enums are placed directly into the field instead of the + `UnknownFieldSet`). +* Packed (whether repeated fields are packed on the wire) + +Semantic features have significantly broader scope, since they must be respected +across languages, and each language must implement the semantic correctly. This +also implies that every language must either (1) know the canonical set of "base +features" for each edition, or (2) that the set of "default" features for the +edition must be resolved in protoc itself and propagated explicitly into the +descriptor. + +## Rev'ing the protobuf IDL vs. descriptor.proto + +Changing `descriptor.proto` to reflect editions is a much more intrusive change +than changing just the protobuf IDL. The protobuf IDL is parsed and resolved in +protoc, and we have only a single implementation of that parser. Any change that +can be resolved in the parser alone is relatively unintrusive (though there are +build horizon issues since GCL parses protos in prod). + +Rev'ing `descriptor.proto` is a far more intrusive change that affects many +downstream systems. Many systems access descriptors through either a descriptor +API (for example, `proto2::Descriptor` in C++) or by directly accessing a proto +from `descriptor.proto` (eg. `google.protobuf.DescriptorProto`). Any changes +here need to be managed much more delicately. + +## Deprecation of the `syntax` keyword from proto IDL + +The `syntax` keyword shall no longer be required/observed when an `edition` +keyword is present, as it is now considered redundant. If `edition` and `syntax` +are both present, `edition` takes precedence and `syntax` is ignored. + +## Migrating from `proto2` and `proto3` to Editions + Features + +Today's usage of syntax opaquely bundles a collection of implied feature flags +that are set based on the presence of `proto2` or `proto3`. This is often a +source of confusion for customers (eg: what am I gaining by moving to proto3? +What am I losing?). + +By deciding that editions/features exist in a state of proto2/3 convergence, +this enables customers to decide for themselves what features are important to +their usage of protos. + +Migrating existing users of proto2 and proto3 to editions w/converged semantics +would mean we'd need to execute a large-scale change to make their +implicit/implied behavior explicit. Here are examples of implied behavior. +today: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Feature + proto2 implied behavior + proto3 implied behavior +
packed_repeated_primitives + 🚫 + ✅ +
extensions + ✅ + 🚫 +
required + ✅ + 🚫 +
groups + ✅ + 🚫 +
cpp_string_view + 🚫 + 🚫 +
java_enum_no_value_of + 🚫 + 🚫 +
open_enums + 🚫 + ✅ +
MORE STUFF ... + + +
+ +### Managing Complexity of `features` for Large Deployments + +A separate concept has been established to help mitigate the complexity of +editions and progressive feature rollouts and synchronizations for larger proto +projects. + +This facility could be used to migrate existing usages of the `syntax` keyword +to use Editions + Features across google3, for example. + +## Prior Work + +* proto{2,3} Convergence Vision (not available externally) + +* Epochs for descriptor.proto (not available externally) + +* Rust editions: https://doc.rust-lang.org/edition-guide/editions/index.html