diff --git a/docs/design/prototiller/README.md b/docs/design/prototiller/README.md new file mode 100644 index 0000000000..ab44afabd3 --- /dev/null +++ b/docs/design/prototiller/README.md @@ -0,0 +1,21 @@ +# Protocol Buffers - Prototiller design documents + +This directory contains historical design documents that describe plans for +implementing Prototiller, a tool for updating proto files to be conformant with Protobuf Editions functionality. For design documents related to Protobuf Editions, see [the README](../editions/README.md). + +These files represent the state that the original documents were in at the time +that they were published to this repository. While some updates *may* be made to +the files after their initial upload, you should consider the possibility that +they are outdated as you read them. These are purely for historical value and +should not be treated as documentation of the current state. + +Note that the authors listed in the topics were the authors of the *original* +version of the document; it may have changed since they last worked on the +document. + +## Design Documents + +The following topics are in this repository: + +* [Prototiller Requirements for Editions](prototiller-reqs-for-editions.md) +* [Prototiller Requirements for Edition Zero](prototiller-reqs-for-edition-zero.md) \ No newline at end of file diff --git a/docs/design/prototiller/prototiller-reqs-for-edition-zero.md b/docs/design/prototiller/prototiller-reqs-for-edition-zero.md new file mode 100644 index 0000000000..0b2180c22d --- /dev/null +++ b/docs/design/prototiller/prototiller-reqs-for-edition-zero.md @@ -0,0 +1,536 @@ +# Prototiller Requirements for Edition Zero + +**Authors:** [@mkruskal-google](https://github.com/mkruskal-google) + +**Approved:** 2023-07-07 + +## Background + +[Edition Zero Features](edition-zero-features.md) lays out the design for our +first edition, which will unify proto2 and proto3 via features. In order to +migrate internal Google repositories (and aid OSS migrations), we will be +leveraging Prototiller to upgrade from legacy syntax to the new editions model. +We will also be using Prototiller for every edition bump, but that's out of +scope for this document (more general requirements are laid out in +[Prototiller Requirements for Editions](prototiller-reqs-for-editions.md)). + +## Overview + +The way the edition zero features were derived, there will always exist a no-op +transformation from proto2/proto3. However, the details of this transformation +can be fairly complex and depend on a lot of different factors. + +A temporary script has been created as a placeholder, which manages to get +fairly good coverage of these rules. Notably, it can't handle groups inside +extensions or oneofs. This, along with its golden tests, can serve as a useful +benchmark for Prototiller. + +### Feature Optimization + +One important piece of Prototiller that will ease the friction of the Edition +Zero large-scale change is a feature optimization phase. If certain features +aren't necessary to make the upgrade a no-op, we shouldn't add them and should +instead rely on the edition defaults for future changes. Similarly, we should +try to minimize the total size of the change by collapsing a feature +specification to a higher level (e.g. file-level defaults of a field feature). + +## Frontend Feature Transformations + +This section details all of our frontend features and how to transform from +proto2/proto3 to edition zero. + +### Field Presence + +The `field_presence` feature defaults to `EXPLICIT`, which matches proto2/proto3 +optional behavior. The `LEGACY_REQUIRED` value corresponds to proto2 required +fields, and `IMPLICIT` corresponds to non-optional proto3 fields. In order to +minimize changes, file-level defaults should be utilized. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+message Foo {
+  optional string bar = 1;
+  required string baz = 2;
+}
+
+
edition = "2023";
+
+message Foo {
+  string bar = 1;
+  string baz = 2 [
+    features.field_presence = LEGACY_REQUIRED];
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+message Foo {
+  optional string bar = 1;
+  string baz = 2;
+  string bam = 3;
+}
+
+
edition = "2023";
+features.field_presence = IMPLICIT;
+
+message Foo {
+  string bar = 1 [features.field_presence = EXPLICIT];
+  string baz = 2;
+  string bam = 3;
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+message Foo {
+  optional string bar = 1;
+  optional string baz = 2;
+}
+
+
edition = "2023";
+
+message Foo {
+  string bar = 1;
+  string baz = 2;
+}
+
+ +### Enum Type + +The `enum_type` feature defaults to `OPEN`, which matches proto3 behavior. The +`CLOSED` value corresponds to typical proto2 behavior. In order to minimize +changes, file-level defaults should be utilized. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+enum Foo {
+  VALUE1 = 0;
+  VALUE2 = 1;
+}
+
+
edition = "2023";
+features.enum_type = CLOSED;
+
+enum Foo {
+  VALUE1 = 0;
+  VALUE2 = 1;
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+enum Foo {
+  VALUE1 = 0;
+  VALUE2 = 1;
+}
+
+
edition = "2023";
+
+enum Foo {
+  VALUE1 = 0;
+  VALUE2 = 1;
+}
+
+ +### Repeated Field Encoding + +The `repeated_field_encoding` feature defaults to `PACKED`, which matches proto3 +behavior. The `EXPANDED` value corresponds to the default proto2 behavior. Both +proto2 and proto3 can have the default behavior overridden by using the `packed` +field option. All of these should be replaced in the migration to edition zero. +Minimization of changes will be a little more complicated here, since there +could exist files where the majority of repeated fields have been overridden. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  repeated int32 bar = 1;
+  repeated int32 baz = 2 [packed = true];
+  repeated int32 bam = 3;
+}
+
+
edition = "2023";
+features.repeated_field_encoding = EXPANDED;
+
+message Foo {
+  repeated int32 bar = 1;
+  repeated int32 baz = 2 [
+    features.repeated_field_encoding = PACKED];
+  repeated int32 bar = 3;
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+message Foo {
+  repeated int32 bar = 1;
+  repeated int32 baz = 2 [packed = false];
+}
+
+
edition = "2023";
+
+message Foo {
+  repeated int32 bar = 2;
+  repeated int32 baz = 2 [
+    features.repeated_field_encoding = EXPANDED];
+}
+
+ + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  repeated int32 x = 1 [packed = true];
+  // Strings are never packed.
+  repeated string z = 1;
+  repeated string w = 2;
+}
+
+
edition = "2023";
+
+message Foo {
+  repeated int32 x = 1;
+  repeated string z = 1;
+  repeated string w = 2;
+}
+
+ +### Message Encoding + +The `message_encoding` feature is designed to replace the proto2-only `group` +syntax (with value `DELIMITED`), with a default that will always be +`LENGTH_PREFIXED`. This is a somewhat awkward transformation in the general +case, since we allow group definitions anywhere fields exist even if message +definitions can't. The basic transformation is to create a new message type in +the nearest enclosing scope with the same name as the field, and lowercase the +field and give it that type. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  optional group Bar = 1 {
+    optional int32 x = 1;
+  }
+  optional Bar baz = 2;
+}
+
+
edition = "2023";
+
+message Foo {
+  message Bar {
+    int32 x = 1;
+  }
+  Bar bar = 1 [features.message_encoding = DELIMITED];
+  Bar baz = 2;
+}
+
+ + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  oneof foo {
+    group Bar = 1 {
+      optional int32 x = 1;
+    }
+  }
+}
+
+
edition = "2023";
+
+message Foo {
+  message Bar {
+    int32 x = 1;
+  }
+  oneof foo {
+    Bar bar = 1 [
+      features.message_encoding = DELIMITED];
+  }
+}
+
+ +### JSON Format + +The `json_format` feature is a bit of an outlier, because (at least for edition +zero) it only affects the frontend build of the proto file. The `ALLOW` value +(proto3 behavior) enables all JSON mapping conflict checks on field names, +unless `deprecated_legacy_json_field_conflicts` is set. The `LEGACY_BEST_EFFORT` +value (proto2 behavior) disables these checks. The ideal minimal transformation +would be to switch to `ALLOW` in all cases except where +`deprecated_legacy_json_field_conflicts` is set or there exist JSON mapping +conflicts. In those cases we can fallback to `LEGACY_BEST_EFFORT`. + +Alternatively, if it's difficult for Prototiller to handle, we could do a +followup large-scale change to remove all `LEGACY_BEST_EFFORT` instances that +pass build. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  optional string bar = 1;
+  optional string baz = 2;
+}
+
+
edition = "2023";
+
+message Foo {
+  string bar = 1;
+  string baz = 2;
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+message Foo {
+  string bar = 1;
+  string baz = 2;
+}
+
+
edition = "2023";
+features.field_presence = IMPLICIT;
+
+message Foo {
+  string bar = 1;
+  string baz = 2;
+}
+
+ + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  // Warning only
+  string bar = 1;
+  string bar_ = 2;
+}
+
+
edition = "2023";
+features.json_format = LEGACY_BEST_EFFORT;
+
+message Foo {
+  string bar = 1;
+  string bar_ = 2;
+}
+
+ + + + + + +
+
syntax = "proto3";
+
+message Foo {
+  option
+  deprecated_legacy_json_field_conflicts = true;
+  string bar = 1;
+  string baz = 2 [json_name = "bar"];
+}
+
+
edition = "2023";
+features.field_presence = IMPLICIT;
+features.json_format = LEGACY_BEST_EFFORT;
+
+message Foo {
+  string bar = 1;
+  string baz = 2;
+}
+
+ +## Backend Feature Transformations + +This section details our backend-specific features and how to transform from +proto2/proto3 to edition zero. + +In order to limit bloat, it would be ideal if we could check whether or not a +proto file is ever used to generate code in the target language. If it's not, +there's no reason to add backend-specific features. + +### Legacy Closed Enum + +Java and C++ by default treat proto3 enums as closed if they're used in a proto2 +message. The internal `cc_open_enum` field option can override this, but it has +**very** limited use and may not be worth considering. While the enum type +behavior should still be determined by Enum Type, we'll need to add this feature +to proto2 files using proto3 messages (the reverse is disallowed). + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+import "some_proto3_file.proto"
+
+enum Proto2Enum {
+  BAR = 0;
+}
+
+message Foo {
+  optional Proto3Enum bar = 1;
+  optional Proto2Enum baz = 2;
+}
+
+
edition = "2023";
+import "third_party/protobuf/cpp_features.proto"
+import "third_party/protobuf/java_features.proto"
+import "some_proto3_file.proto"
+
+features.enum_type = CLOSED;
+
+message Foo {
+  Proto3Enum bar = 1 [
+    features.(pb.cpp).legacy_closed_enum = true,
+    features.(pb.java).legacy_closed_enum = true];
+  Proto2Enum baz = 2;
+}
+
+ +### UTF8 Validation + +This feature is pending approval of *Editions Zero Feature: utf8_validation* +(not available externally). + +## Other Transformations + +In addition to features, there are some other changes we've made for edition +zero. + +### Reserved Identifier Syntax + +With *Protobuf Change Proposal: Reserved Identifiers* (not available +externally), we've decided to switch from strings to identifiers for reserved +fields. This *should* be a trivial change, but if the proto file contains +strings that aren't valid identifiers there's some ambiguity. They're currently +ignored today, but they could be typos we wouldn't want to just blindly delete. +So instead, we'll leave behind a comment. + +Example transformations: + + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  reserved "bar", "baz";
+}
+
+
edition = "2023";
+
+message Foo {
+  reserved bar, baz;
+}
+
+ + + + + + +
+
syntax = "proto2";
+
+message Foo {
+  reserved "bar", "1";
+}
+
+
edition = "2023";
+
+message Foo {
+  reserved bar;
+  /*reserved "1";*/
+}
+