diff --git a/docs/design/prototiller/README.md b/docs/design/prototiller/README.md index ab44afabd3..40496e1356 100644 --- a/docs/design/prototiller/README.md +++ b/docs/design/prototiller/README.md @@ -18,4 +18,5 @@ document. The following topics are in this repository: * [Prototiller Requirements for Editions](prototiller-reqs-for-editions.md) -* [Prototiller Requirements for Edition Zero](prototiller-reqs-for-edition-zero.md) \ No newline at end of file +* [Prototiller Requirements for Edition Zero](prototiller-reqs-for-edition-zero.md) +* [Editions Tooling](editions-tooling.md) \ No newline at end of file diff --git a/docs/design/prototiller/editions-tooling.md b/docs/design/prototiller/editions-tooling.md new file mode 100644 index 0000000000..2c4b137ccc --- /dev/null +++ b/docs/design/prototiller/editions-tooling.md @@ -0,0 +1,165 @@ +# Editions Tooling + +**Authors:** [@mcy](https://github.com/mcy) + +**Approved:** 2022-08-09 + +## Overview + +[Protobuf Editions](../editions/what-are-protobuf-editions.md) aims to introduce +new semantics for Protobuf, but with a major emphasis on mechanical, incremental +upgradability, to avoid the two systems problem of proto2 and proto3. The first +edition (likely "2023") will introduce *converged semantics* for Protobuf that +permit everything that proto2 and proto3 permitted: any non-editions file can +become an editions file with minimal human intervention. + +We plan to achieve this with a strong tooling story. These tools are intended to +fully automate major steps in editions-related upgrade operations, for both large-scale changes +and open source software strategic reasons. In particular: + +* Non-automated large-scale change work in the editions space can be constrained to fixing + uses of generated code and flipping features on specific fields (or other + declarations). +* We can give our external users the most painless migration possible, which + consists of "run this tool and commit the results". + +This document describes the detailed design of the tools we need. This document +presupposes *Protochangifier Backend Design Doc* (not available externally) integrated into protoc as a prerequisite, so we +can ship the tooling as part of protoc. Because the tooling must know the full +definition of an edition to work (see below), it seems to more-or-less place a +hard requirement of being linked to protoc. + +There are three tools we will build. + +1. The "features janitor". This is a mode of `protoc` which consumes a `.proto` + file and produces a `ProtoChangeSpec` that describes how to add and remove + features such that the resulting janitor'ed file has fewer explicit + features, but is not semantically different. +2. The "editions adopter". This is another mode of `protoc`, which produces a + `ProtoChangeSpec` that describes how to bring a `proto2` or `proto3` file + into editions mode, starting at a specific edition. +3. The "editions upgrader". This is a generalization of the adopter, which + takes an editions file and produces a `ProtoChangeSpec` that brings it into + a newer edition. + +These tools will fundamentally speak `ProtoChangeSpec`, but we should also +provide in-place versions, since those will likely be more useful to OSS users +that just want to run the tool atomically on their entire project. + +## The Janitor + +The features janitor is intended to be used as part of migrations to +periodically clean up any messes made by flipping lots of features. +Conceptually, it turns this proto file + +``` +edition = "2023"; +message Foo { + optional string a = 1 [features.(pb.cpp).string_type = VIEW]; + optional string b = 2 [features.(pb.cpp).string_type = VIEW]; + optional string c = 3 [features.(pb.cpp).string_type = VIEW]; + optional string d = 4 [features.(pb.cpp).string_type = VIEW]; + optional string e = 5 [features.(pb.cpp).string_type = VIEW]; +} +message Bar { + optional string a = 1 [features.(pb.cpp).string_type = VIEW]; + optional string b = 2; + optional string c = 3; + optional string d = 4; + optional string e = 5; +} +``` + +into this one: + +``` +edition = "2023"; +message Foo { + option features.(pb.cpp).string_type = VIEW; + optional string a = 1; + optional string b = 2; + optional string c = 3; + optional string d = 4; + optional string e = 5; +} +message Bar { + optional string a = 1 [features.(pb.cpp).string_type = VIEW]; + optional string b = 2; + optional string c = 3; + optional string d = 4; + optional string e = 5; +} +``` + +Specifically, the janitor tries to minimize the number of explicit features on +the Protobuf schema. Actually doing this minimally feels like it's nonlinear, so +we should invent a heuristic. A sketch of what this could look like: + +1. Each feature that can appear explicitly on an AST node is either *critical* + for that node or only for grouping. For example, `string_type` is critical + for fields but not for messages. +2. Propagate features explicitly to every node, including edition defaults. +3. For each feature `f`, for each node `n` that `f` is non-critical for that + contains (recursively) nodes that it is critical for (in DFS order): + 1. Set `f` for `n` to the value for `f` that the plurality of its direct + children have, and remove the explicit `f` from those. If tied, choose + the edition default if it is among the plurals, or else choose randomly. +4. Once repeated up to the root, delete all explicit features that are + reachable from the root without crossing another explicit feature that isn't + the edition default. I.e., those features which are implied by the edition + defaults. + +It is easy to construct cases where this is not optimal, but that is not +important. This merely exists to make files prettier while keeping them +equivalent. It is easy to see that, by construction, this algorithm satisfies +the "semantic no-op" requirement. + +## The Adopter and the Updater + +The adopter is merely a special case of the updater where `proto2` and `proto3` +are viewed as editions (in the sense that an edition is a set of defaults), so +we will only describe the updater. + +To update one edition ("old") to another ("new", although not necessarily a +newer edition): + +1. Features that are not already explicitly set at the top level are set to the + default given by "old"; they are only set on the outermost scope that does + not have an explicit feature. For example, for file-level features, this + means making all features explicit at the file level. For message-level + features that are not file-level, this means placing an explicit feature on + all top-level messages. This is a no-op, because `edition = "old";` implies + this. +2. The file's edition is set from "old" to "new". Because every feature that + could be explicit is explicit, this is a no-op. +3. Feature janitor runs. This explicitly propagates all features (all of which + are set explicitly at the top level), and then cleans them up with respect + to the "new" edition; note that feature janitor gives preference to editions + defaults. This is a no-op, because feature janitor is a no-op. + +## UX Concerns + +Bundling the editions tooling with `protoc` ensures that it is easy to find. The +following will be the pattern for all Protochangifier tooling bundled into +`protoc`: + +* There is a flag `--change_spec=changespec.pb` which will cause protoc to + apply a changespec to the passed-in `.proto` file, e.g. `protoc + --change_spec=spec.pb --change_out=foo-changed.proto foo.proto`. This writes + the change to `foo-changed.proto`. This may be the same file as `foo.proto` + for in-place updates; it may be left out to have the change printed to + stdout. This is the core entry-point for Protochanfigier. +* There is a flag `--my_analysis` for the given analysis, e.g. `--janitor`. + This flag can have an optional argument: if set, it will output the change + spec to that path, e.g. `--janitor=spec.pb`. If it is not passed in, the + change is applied in place without the need to use `protoc --change_spec`. + +Alternatively, we could provide these as standalone tools. However, it seems +useful from a distribution perspective and user education perspective to say +"this is just part of the compiler". We expect to produce new migration tooling +with Protochangifier on an ongoing basis, so teaching users that every analysis +looks the same is important. Compare `rustfix`, the tool that Rust uses for +things like upgrading editions. Although it is a separate binary, it is +accessible through `cargo fix`, and in a lot of ways `cargo` is the user-facing +interface to Rust; having it be part of the "swiss army knife" helps put it in +front of users.