PiperOrigin-RevId: 565135915pull/13827/head
parent
4d6ad56e4a
commit
762e529d83
2 changed files with 557 additions and 0 deletions
@ -0,0 +1,21 @@ |
||||
# Protocol Buffers - Prototiller design documents |
||||
|
||||
This directory contains historical design documents that describe plans for |
||||
implementing Prototiller, a tool for updating proto files to be conformant with Protobuf Editions functionality. For design documents related to Protobuf Editions, see [the README](../editions/README.md). |
||||
|
||||
These files represent the state that the original documents were in at the time |
||||
that they were published to this repository. While some updates *may* be made to |
||||
the files after their initial upload, you should consider the possibility that |
||||
they are outdated as you read them. These are purely for historical value and |
||||
should not be treated as documentation of the current state. |
||||
|
||||
Note that the authors listed in the topics were the authors of the *original* |
||||
version of the document; it may have changed since they last worked on the |
||||
document. |
||||
|
||||
## Design Documents |
||||
|
||||
The following topics are in this repository: |
||||
|
||||
* [Prototiller Requirements for Editions](prototiller-reqs-for-editions.md) |
||||
* [Prototiller Requirements for Edition Zero](prototiller-reqs-for-edition-zero.md) |
@ -0,0 +1,536 @@ |
||||
# Prototiller Requirements for Edition Zero |
||||
|
||||
**Authors:** [@mkruskal-google](https://github.com/mkruskal-google) |
||||
|
||||
**Approved:** 2023-07-07 |
||||
|
||||
## Background |
||||
|
||||
[Edition Zero Features](edition-zero-features.md) lays out the design for our |
||||
first edition, which will unify proto2 and proto3 via features. In order to |
||||
migrate internal Google repositories (and aid OSS migrations), we will be |
||||
leveraging Prototiller to upgrade from legacy syntax to the new editions model. |
||||
We will also be using Prototiller for every edition bump, but that's out of |
||||
scope for this document (more general requirements are laid out in |
||||
[Prototiller Requirements for Editions](prototiller-reqs-for-editions.md)). |
||||
|
||||
## Overview |
||||
|
||||
The way the edition zero features were derived, there will always exist a no-op |
||||
transformation from proto2/proto3. However, the details of this transformation |
||||
can be fairly complex and depend on a lot of different factors. |
||||
|
||||
A temporary script has been created as a placeholder, which manages to get |
||||
fairly good coverage of these rules. Notably, it can't handle groups inside |
||||
extensions or oneofs. This, along with its golden tests, can serve as a useful |
||||
benchmark for Prototiller. |
||||
|
||||
### Feature Optimization |
||||
|
||||
One important piece of Prototiller that will ease the friction of the Edition |
||||
Zero large-scale change is a feature optimization phase. If certain features |
||||
aren't necessary to make the upgrade a no-op, we shouldn't add them and should |
||||
instead rely on the edition defaults for future changes. Similarly, we should |
||||
try to minimize the total size of the change by collapsing a feature |
||||
specification to a higher level (e.g. file-level defaults of a field feature). |
||||
|
||||
## Frontend Feature Transformations |
||||
|
||||
This section details all of our frontend features and how to transform from |
||||
proto2/proto3 to edition zero. |
||||
|
||||
### Field Presence |
||||
|
||||
The `field_presence` feature defaults to `EXPLICIT`, which matches proto2/proto3 |
||||
optional behavior. The `LEGACY_REQUIRED` value corresponds to proto2 required |
||||
fields, and `IMPLICIT` corresponds to non-optional proto3 fields. In order to |
||||
minimize changes, file-level defaults should be utilized. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
message Foo { |
||||
optional string bar = 1; |
||||
required string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2 [ |
||||
features.field_presence = LEGACY_REQUIRED]; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
message Foo { |
||||
optional string bar = 1; |
||||
string baz = 2; |
||||
string bam = 3; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.field_presence = IMPLICIT; |
||||
|
||||
message Foo { |
||||
string bar = 1 [features.field_presence = EXPLICIT]; |
||||
string baz = 2; |
||||
string bam = 3; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
message Foo { |
||||
optional string bar = 1; |
||||
optional string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
### Enum Type |
||||
|
||||
The `enum_type` feature defaults to `OPEN`, which matches proto3 behavior. The |
||||
`CLOSED` value corresponds to typical proto2 behavior. In order to minimize |
||||
changes, file-level defaults should be utilized. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
enum Foo { |
||||
VALUE1 = 0; |
||||
VALUE2 = 1; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.enum_type = CLOSED; |
||||
|
||||
enum Foo { |
||||
VALUE1 = 0; |
||||
VALUE2 = 1; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
enum Foo { |
||||
VALUE1 = 0; |
||||
VALUE2 = 1; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
enum Foo { |
||||
VALUE1 = 0; |
||||
VALUE2 = 1; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
### Repeated Field Encoding |
||||
|
||||
The `repeated_field_encoding` feature defaults to `PACKED`, which matches proto3 |
||||
behavior. The `EXPANDED` value corresponds to the default proto2 behavior. Both |
||||
proto2 and proto3 can have the default behavior overridden by using the `packed` |
||||
field option. All of these should be replaced in the migration to edition zero. |
||||
Minimization of changes will be a little more complicated here, since there |
||||
could exist files where the majority of repeated fields have been overridden. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
repeated int32 bar = 1; |
||||
repeated int32 baz = 2 [packed = true]; |
||||
repeated int32 bam = 3; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.repeated_field_encoding = EXPANDED; |
||||
|
||||
message Foo { |
||||
repeated int32 bar = 1; |
||||
repeated int32 baz = 2 [ |
||||
features.repeated_field_encoding = PACKED]; |
||||
repeated int32 bar = 3; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
message Foo { |
||||
repeated int32 bar = 1; |
||||
repeated int32 baz = 2 [packed = false]; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
repeated int32 bar = 2; |
||||
repeated int32 baz = 2 [ |
||||
features.repeated_field_encoding = EXPANDED]; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
repeated int32 x = 1 [packed = true]; |
||||
// Strings are never packed. |
||||
repeated string z = 1; |
||||
repeated string w = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
repeated int32 x = 1; |
||||
repeated string z = 1; |
||||
repeated string w = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
### Message Encoding |
||||
|
||||
The `message_encoding` feature is designed to replace the proto2-only `group` |
||||
syntax (with value `DELIMITED`), with a default that will always be |
||||
`LENGTH_PREFIXED`. This is a somewhat awkward transformation in the general |
||||
case, since we allow group definitions anywhere fields exist even if message |
||||
definitions can't. The basic transformation is to create a new message type in |
||||
the nearest enclosing scope with the same name as the field, and lowercase the |
||||
field and give it that type. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
optional group Bar = 1 { |
||||
optional int32 x = 1; |
||||
} |
||||
optional Bar baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
message Bar { |
||||
int32 x = 1; |
||||
} |
||||
Bar bar = 1 [features.message_encoding = DELIMITED]; |
||||
Bar baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
oneof foo { |
||||
group Bar = 1 { |
||||
optional int32 x = 1; |
||||
} |
||||
} |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
message Bar { |
||||
int32 x = 1; |
||||
} |
||||
oneof foo { |
||||
Bar bar = 1 [ |
||||
features.message_encoding = DELIMITED]; |
||||
} |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
### JSON Format |
||||
|
||||
The `json_format` feature is a bit of an outlier, because (at least for edition |
||||
zero) it only affects the frontend build of the proto file. The `ALLOW` value |
||||
(proto3 behavior) enables all JSON mapping conflict checks on field names, |
||||
unless `deprecated_legacy_json_field_conflicts` is set. The `LEGACY_BEST_EFFORT` |
||||
value (proto2 behavior) disables these checks. The ideal minimal transformation |
||||
would be to switch to `ALLOW` in all cases except where |
||||
`deprecated_legacy_json_field_conflicts` is set or there exist JSON mapping |
||||
conflicts. In those cases we can fallback to `LEGACY_BEST_EFFORT`. |
||||
|
||||
Alternatively, if it's difficult for Prototiller to handle, we could do a |
||||
followup large-scale change to remove all `LEGACY_BEST_EFFORT` instances that |
||||
pass build. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
optional string bar = 1; |
||||
optional string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.field_presence = IMPLICIT; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
// Warning only |
||||
string bar = 1; |
||||
string bar_ = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.json_format = LEGACY_BEST_EFFORT; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string bar_ = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto3"; |
||||
|
||||
message Foo { |
||||
option |
||||
deprecated_legacy_json_field_conflicts = true; |
||||
string bar = 1; |
||||
string baz = 2 [json_name = "bar"]; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
features.field_presence = IMPLICIT; |
||||
features.json_format = LEGACY_BEST_EFFORT; |
||||
|
||||
message Foo { |
||||
string bar = 1; |
||||
string baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
## Backend Feature Transformations |
||||
|
||||
This section details our backend-specific features and how to transform from |
||||
proto2/proto3 to edition zero. |
||||
|
||||
In order to limit bloat, it would be ideal if we could check whether or not a |
||||
proto file is ever used to generate code in the target language. If it's not, |
||||
there's no reason to add backend-specific features. |
||||
|
||||
### Legacy Closed Enum |
||||
|
||||
Java and C++ by default treat proto3 enums as closed if they're used in a proto2 |
||||
message. The internal `cc_open_enum` field option can override this, but it has |
||||
**very** limited use and may not be worth considering. While the enum type |
||||
behavior should still be determined by Enum Type, we'll need to add this feature |
||||
to proto2 files using proto3 messages (the reverse is disallowed). |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
import "some_proto3_file.proto" |
||||
|
||||
enum Proto2Enum { |
||||
BAR = 0; |
||||
} |
||||
|
||||
message Foo { |
||||
optional Proto3Enum bar = 1; |
||||
optional Proto2Enum baz = 2; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
import "third_party/protobuf/cpp_features.proto" |
||||
import "third_party/protobuf/java_features.proto" |
||||
import "some_proto3_file.proto" |
||||
|
||||
features.enum_type = CLOSED; |
||||
|
||||
message Foo { |
||||
Proto3Enum bar = 1 [ |
||||
features.(pb.cpp).legacy_closed_enum = true, |
||||
features.(pb.java).legacy_closed_enum = true]; |
||||
Proto2Enum baz = 2; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
### UTF8 Validation |
||||
|
||||
This feature is pending approval of *Editions Zero Feature: utf8_validation* |
||||
(not available externally). |
||||
|
||||
## Other Transformations |
||||
|
||||
In addition to features, there are some other changes we've made for edition |
||||
zero. |
||||
|
||||
### Reserved Identifier Syntax |
||||
|
||||
With *Protobuf Change Proposal: Reserved Identifiers* (not available |
||||
externally), we've decided to switch from strings to identifiers for reserved |
||||
fields. This *should* be a trivial change, but if the proto file contains |
||||
strings that aren't valid identifiers there's some ambiguity. They're currently |
||||
ignored today, but they could be typos we wouldn't want to just blindly delete. |
||||
So instead, we'll leave behind a comment. |
||||
|
||||
Example transformations: |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
reserved "bar", "baz"; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
reserved bar, baz; |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
<table> |
||||
<tr> |
||||
<td> |
||||
<pre>syntax = "proto2"; |
||||
|
||||
message Foo { |
||||
reserved "bar", "1"; |
||||
}</pre> |
||||
</td> |
||||
<td> |
||||
<pre>edition = "2023"; |
||||
|
||||
message Foo { |
||||
reserved bar; |
||||
/*reserved "1";*/ |
||||
}</pre> |
||||
</td> |
||||
</tr> |
||||
</table> |
Loading…
Reference in new issue