Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
537 lines
11 KiB
537 lines
11 KiB
1 year ago
|
# Prototiller Requirements for Edition Zero
|
||
|
|
||
|
**Authors:** [@mkruskal-google](https://github.com/mkruskal-google)
|
||
|
|
||
|
**Approved:** 2023-07-07
|
||
|
|
||
|
## Background
|
||
|
|
||
|
[Edition Zero Features](edition-zero-features.md) lays out the design for our
|
||
|
first edition, which will unify proto2 and proto3 via features. In order to
|
||
|
migrate internal Google repositories (and aid OSS migrations), we will be
|
||
|
leveraging Prototiller to upgrade from legacy syntax to the new editions model.
|
||
|
We will also be using Prototiller for every edition bump, but that's out of
|
||
|
scope for this document (more general requirements are laid out in
|
||
|
[Prototiller Requirements for Editions](prototiller-reqs-for-editions.md)).
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
The way the edition zero features were derived, there will always exist a no-op
|
||
|
transformation from proto2/proto3. However, the details of this transformation
|
||
|
can be fairly complex and depend on a lot of different factors.
|
||
|
|
||
|
A temporary script has been created as a placeholder, which manages to get
|
||
|
fairly good coverage of these rules. Notably, it can't handle groups inside
|
||
|
extensions or oneofs. This, along with its golden tests, can serve as a useful
|
||
|
benchmark for Prototiller.
|
||
|
|
||
|
### Feature Optimization
|
||
|
|
||
|
One important piece of Prototiller that will ease the friction of the Edition
|
||
|
Zero large-scale change is a feature optimization phase. If certain features
|
||
|
aren't necessary to make the upgrade a no-op, we shouldn't add them and should
|
||
|
instead rely on the edition defaults for future changes. Similarly, we should
|
||
|
try to minimize the total size of the change by collapsing a feature
|
||
|
specification to a higher level (e.g. file-level defaults of a field feature).
|
||
|
|
||
|
## Frontend Feature Transformations
|
||
|
|
||
|
This section details all of our frontend features and how to transform from
|
||
|
proto2/proto3 to edition zero.
|
||
|
|
||
|
### Field Presence
|
||
|
|
||
|
The `field_presence` feature defaults to `EXPLICIT`, which matches proto2/proto3
|
||
|
optional behavior. The `LEGACY_REQUIRED` value corresponds to proto2 required
|
||
|
fields, and `IMPLICIT` corresponds to non-optional proto3 fields. In order to
|
||
|
minimize changes, file-level defaults should be utilized.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
message Foo {
|
||
|
optional string bar = 1;
|
||
|
required string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2 [
|
||
|
features.field_presence = LEGACY_REQUIRED];
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
optional string bar = 1;
|
||
|
string baz = 2;
|
||
|
string bam = 3;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.field_presence = IMPLICIT;
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1 [features.field_presence = EXPLICIT];
|
||
|
string baz = 2;
|
||
|
string bam = 3;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
optional string bar = 1;
|
||
|
optional string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
### Enum Type
|
||
|
|
||
|
The `enum_type` feature defaults to `OPEN`, which matches proto3 behavior. The
|
||
|
`CLOSED` value corresponds to typical proto2 behavior. In order to minimize
|
||
|
changes, file-level defaults should be utilized.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
enum Foo {
|
||
|
VALUE1 = 0;
|
||
|
VALUE2 = 1;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.enum_type = CLOSED;
|
||
|
|
||
|
enum Foo {
|
||
|
VALUE1 = 0;
|
||
|
VALUE2 = 1;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
enum Foo {
|
||
|
VALUE1 = 0;
|
||
|
VALUE2 = 1;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
enum Foo {
|
||
|
VALUE1 = 0;
|
||
|
VALUE2 = 1;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
### Repeated Field Encoding
|
||
|
|
||
|
The `repeated_field_encoding` feature defaults to `PACKED`, which matches proto3
|
||
|
behavior. The `EXPANDED` value corresponds to the default proto2 behavior. Both
|
||
|
proto2 and proto3 can have the default behavior overridden by using the `packed`
|
||
|
field option. All of these should be replaced in the migration to edition zero.
|
||
|
Minimization of changes will be a little more complicated here, since there
|
||
|
could exist files where the majority of repeated fields have been overridden.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 bar = 1;
|
||
|
repeated int32 baz = 2 [packed = true];
|
||
|
repeated int32 bam = 3;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.repeated_field_encoding = EXPANDED;
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 bar = 1;
|
||
|
repeated int32 baz = 2 [
|
||
|
features.repeated_field_encoding = PACKED];
|
||
|
repeated int32 bar = 3;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 bar = 1;
|
||
|
repeated int32 baz = 2 [packed = false];
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 bar = 2;
|
||
|
repeated int32 baz = 2 [
|
||
|
features.repeated_field_encoding = EXPANDED];
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 x = 1 [packed = true];
|
||
|
// Strings are never packed.
|
||
|
repeated string z = 1;
|
||
|
repeated string w = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
repeated int32 x = 1;
|
||
|
repeated string z = 1;
|
||
|
repeated string w = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
### Message Encoding
|
||
|
|
||
|
The `message_encoding` feature is designed to replace the proto2-only `group`
|
||
|
syntax (with value `DELIMITED`), with a default that will always be
|
||
|
`LENGTH_PREFIXED`. This is a somewhat awkward transformation in the general
|
||
|
case, since we allow group definitions anywhere fields exist even if message
|
||
|
definitions can't. The basic transformation is to create a new message type in
|
||
|
the nearest enclosing scope with the same name as the field, and lowercase the
|
||
|
field and give it that type.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
optional group Bar = 1 {
|
||
|
optional int32 x = 1;
|
||
|
}
|
||
|
optional Bar baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
message Bar {
|
||
|
int32 x = 1;
|
||
|
}
|
||
|
Bar bar = 1 [features.message_encoding = DELIMITED];
|
||
|
Bar baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
oneof foo {
|
||
|
group Bar = 1 {
|
||
|
optional int32 x = 1;
|
||
|
}
|
||
|
}
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
message Bar {
|
||
|
int32 x = 1;
|
||
|
}
|
||
|
oneof foo {
|
||
|
Bar bar = 1 [
|
||
|
features.message_encoding = DELIMITED];
|
||
|
}
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
### JSON Format
|
||
|
|
||
|
The `json_format` feature is a bit of an outlier, because (at least for edition
|
||
|
zero) it only affects the frontend build of the proto file. The `ALLOW` value
|
||
|
(proto3 behavior) enables all JSON mapping conflict checks on field names,
|
||
|
unless `deprecated_legacy_json_field_conflicts` is set. The `LEGACY_BEST_EFFORT`
|
||
|
value (proto2 behavior) disables these checks. The ideal minimal transformation
|
||
|
would be to switch to `ALLOW` in all cases except where
|
||
|
`deprecated_legacy_json_field_conflicts` is set or there exist JSON mapping
|
||
|
conflicts. In those cases we can fallback to `LEGACY_BEST_EFFORT`.
|
||
|
|
||
|
Alternatively, if it's difficult for Prototiller to handle, we could do a
|
||
|
followup large-scale change to remove all `LEGACY_BEST_EFFORT` instances that
|
||
|
pass build.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
optional string bar = 1;
|
||
|
optional string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.field_presence = IMPLICIT;
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
// Warning only
|
||
|
string bar = 1;
|
||
|
string bar_ = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.json_format = LEGACY_BEST_EFFORT;
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string bar_ = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
option
|
||
|
deprecated_legacy_json_field_conflicts = true;
|
||
|
string bar = 1;
|
||
|
string baz = 2 [json_name = "bar"];
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
features.field_presence = IMPLICIT;
|
||
|
features.json_format = LEGACY_BEST_EFFORT;
|
||
|
|
||
|
message Foo {
|
||
|
string bar = 1;
|
||
|
string baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
## Backend Feature Transformations
|
||
|
|
||
|
This section details our backend-specific features and how to transform from
|
||
|
proto2/proto3 to edition zero.
|
||
|
|
||
|
In order to limit bloat, it would be ideal if we could check whether or not a
|
||
|
proto file is ever used to generate code in the target language. If it's not,
|
||
|
there's no reason to add backend-specific features.
|
||
|
|
||
|
### Legacy Closed Enum
|
||
|
|
||
|
Java and C++ by default treat proto3 enums as closed if they're used in a proto2
|
||
|
message. The internal `cc_open_enum` field option can override this, but it has
|
||
|
**very** limited use and may not be worth considering. While the enum type
|
||
|
behavior should still be determined by Enum Type, we'll need to add this feature
|
||
|
to proto2 files using proto3 messages (the reverse is disallowed).
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
import "some_proto3_file.proto"
|
||
|
|
||
|
enum Proto2Enum {
|
||
|
BAR = 0;
|
||
|
}
|
||
|
|
||
|
message Foo {
|
||
|
optional Proto3Enum bar = 1;
|
||
|
optional Proto2Enum baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
import "third_party/protobuf/cpp_features.proto"
|
||
|
import "third_party/protobuf/java_features.proto"
|
||
|
import "some_proto3_file.proto"
|
||
|
|
||
|
features.enum_type = CLOSED;
|
||
|
|
||
|
message Foo {
|
||
|
Proto3Enum bar = 1 [
|
||
|
features.(pb.cpp).legacy_closed_enum = true,
|
||
|
features.(pb.java).legacy_closed_enum = true];
|
||
|
Proto2Enum baz = 2;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
### UTF8 Validation
|
||
|
|
||
|
This feature is pending approval of *Editions Zero Feature: utf8_validation*
|
||
|
(not available externally).
|
||
|
|
||
|
## Other Transformations
|
||
|
|
||
|
In addition to features, there are some other changes we've made for edition
|
||
|
zero.
|
||
|
|
||
|
### Reserved Identifier Syntax
|
||
|
|
||
|
With *Protobuf Change Proposal: Reserved Identifiers* (not available
|
||
|
externally), we've decided to switch from strings to identifiers for reserved
|
||
|
fields. This *should* be a trivial change, but if the proto file contains
|
||
|
strings that aren't valid identifiers there's some ambiguity. They're currently
|
||
|
ignored today, but they could be typos we wouldn't want to just blindly delete.
|
||
|
So instead, we'll leave behind a comment.
|
||
|
|
||
|
Example transformations:
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
reserved "bar", "baz";
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
reserved bar, baz;
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
<pre>syntax = "proto2";
|
||
|
|
||
|
message Foo {
|
||
|
reserved "bar", "1";
|
||
|
}</pre>
|
||
|
</td>
|
||
|
<td>
|
||
|
<pre>edition = "2023";
|
||
|
|
||
|
message Foo {
|
||
|
reserved bar;
|
||
|
/*reserved "1";*/
|
||
|
}</pre>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|