Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
245 lines
10 KiB
245 lines
10 KiB
# Stricter Schemas with Editions |
|
|
|
**Author:** [@mcy](https://github.com/mcy) |
|
|
|
**Approved:** 2022-11-28 |
|
|
|
## Overview |
|
|
|
The Protobuf language is surprisingly lax in what it allows in some places, even |
|
though these corners of the syntax space are rarely exercised in real use, and |
|
which add complexity to backends and runtimes. |
|
|
|
This document describes several such corners in the language, and how we might |
|
use Editions to fix them (spoiler: we'll add a feature for each one and then |
|
ratchet the features). |
|
|
|
This is primarily a memo on a use-case for Editions, and not a design doc per |
|
se. |
|
|
|
## Potential Lints |
|
|
|
### Entity Names |
|
|
|
Protobuf does not enforce any constraints on names other than the "ASCII |
|
identifier" rule: they must match the regex `[A-Za-z_][A-Za-z0-9_]*`. This |
|
results in problems for backends: |
|
|
|
* Backends need to be able to convert between PascalCase, camelCase, |
|
snake_case, and SHOUTY_CASE. Doing so correctly is surprisingly tricky. |
|
* Extraneous underscores, such as underscores in names that want to be |
|
PascalCase, trailing underscores, leading underscores, and repeated |
|
underscores create problems for case conversion and can clash with private |
|
names generated by backends. |
|
* Protobuf does not support non-ASCII identifiers, mostly out of inertia more |
|
than anything else. Because some languages (Java most prominent among them) |
|
do not support them, we can never support them, but we are not particularly |
|
clear on this point. |
|
|
|
The Protobuf language should be as strict as possible in what patterns it |
|
accepts for identifiers, since these need to be transformed to many languages. |
|
Thus, we propose the following regexes for the three casings used in Protobuf: |
|
|
|
* `([A-Z][a-zA-Z0-9]*)+` for PascalCase. We require this case for: |
|
* Messages. |
|
* Enums. |
|
* Services. |
|
* Methods. |
|
* `[a-z][a-z0-9]*(_[a-z0-9]+)*` for snake_case. We require this case for: |
|
* Fields (including extensions). |
|
* Package components. |
|
* `[A-Z][A-Z0-9]*(_[A-Z0-9]+)*` for SHOUTY_CASE. We require this case for: |
|
* Enum values. |
|
|
|
These patterns are intended to reject extraneous underscores, and to make casing |
|
of ASCII letters consistent. We explicitly only support ASCII for maximal |
|
portability to target languages. Note that option names are not included, since |
|
those are defined as fields in a proto, and would be subject to this rule |
|
automatically. |
|
|
|
To migrate, we would introduce a bool feature `feature.relax_identifier_rules`, |
|
which can be applied to any entity. When set, it would cause the compiler to |
|
reject `.proto` files which contain identifiers that don't match the above |
|
constraints. It would default to true and would switch to false in a future |
|
edition. |
|
|
|
### Keywords as Identifiers |
|
|
|
Currently, the Protobuf language allows using keywords as identifiers. This |
|
makes the parser somewhat more complicated than it has to be for minimal |
|
benefit, and shadowing behavior is not well-specified. For example, what does |
|
the following compile as? |
|
|
|
``` |
|
message Foo { |
|
message int32 {} |
|
optional int32 foo = 1; |
|
} |
|
``` |
|
|
|
This is particularly fraught in places where either a keyword or a type name can |
|
follow. For example, `optional foo = 1;` is a proto3 non-optional with type |
|
`optional`, but the parser can't tell until it sees the `=` sign. |
|
|
|
To avoid this and eventually stop supporting this in the parser, we make the |
|
following set of keywords true reserved names that cannot be used as |
|
identifiers: |
|
|
|
``` |
|
bool bytes double edition enum extend extensions fixed32 |
|
fixed64 float group import int32 int64 map max |
|
message oneof option optional package public repeated required |
|
reserved returns rpc service sfixed32 sfixed64 sint32 sint64 |
|
stream string syntax to uint32 uint64 weak |
|
``` |
|
|
|
Additionally, we introduce the syntax `#optional` for escaping a keyword as an |
|
identifier. This may *only* be used on keywords, and not non-keyword |
|
identifiers. |
|
|
|
To migrate, we would introduce a bool feature `feature.keywords_as_identifiers`, |
|
which can be applied to any entity. When set, it would cause the compiler to |
|
reject `.proto` files which contain identifiers that use the names of keywords. |
|
It would migrate true->false in a future edition. The `#optional` syntax would |
|
not need to be feature-gated. |
|
|
|
From time to time we may introduce new keywords. The best procedure for doing so |
|
is to add a `feature.xxx_is_a_keyword` feature, start it out as true, and then |
|
switch it to false in an edition, which would cause it to be treated as a |
|
keyword for the purposes of this check. There's nothing stopping us from |
|
starting to use it in the syntax without an edition if it would be relatively |
|
unambiguous (i.e., a "contextual" keyword). Rust provides guidance here: they |
|
really hate contextual keywords since it complicates the parser, so keywords |
|
start out as contextual and become properly reserved in the next Rust edition. |
|
|
|
### Nonempty Package |
|
|
|
Right now, an empty package is technically permitted. We should remove this |
|
functionality from the language completely and require every file to declare a |
|
package. |
|
|
|
We would introduce a feature like `feature.allow_missing_package`, start it out |
|
as true, and switch it to false. |
|
|
|
### Invalid Names in `reserved` |
|
|
|
Currently, `reserved "foo-bar";` is accepted. It is not a valid name for a field |
|
and thus should be rejected. Ideally we should remove this syntax altogether and |
|
only permit the use of identifiers in this position, such as `reserved foo, |
|
bar;`. |
|
|
|
We would introduce a feature like `feature.allow_strings_in_reserved`, start it |
|
out as true, and then switch it to false. |
|
|
|
### Almost All Names are Fully Qualified |
|
|
|
Right now, Protobuf defines a complicated name resolution scheme that involves |
|
matching subsets of names inspired by that of C++ (which is even more |
|
complicated than ours!). Instead, we should require that every name be either a |
|
single identifier OR fully-qualified. This is an attempt to move to Go-style |
|
name resolution, which is significantly simpler to implement and explain. |
|
|
|
In particular, if a name is a single identifier, then: |
|
|
|
* It must be the name of a type defined at the top level of the current file. |
|
* If it is the name of a message or enum for a field's type, it may be the |
|
name of a type defined in the current message. This does *not* apply to |
|
extension fields. |
|
|
|
Because any multi-component path must be fully qualified, we no longer need the |
|
`.foo.Bar` syntax anymore, except to refer to messages defined in files without |
|
a package. We forbid `.`-prefixed names except in that case. |
|
|
|
We would introduce a feature like `features.use_cpp_style_name_resolution`, |
|
start it out as true, and then switch it to false. |
|
|
|
Ideally, if we get strict identifier names, we can tell that `Foo.Bar` is rooted |
|
at a message, rather than a package. In that case, we could go as far as saying |
|
that "names that start with a lower-case letter are fully-qualified, otherwise |
|
they are relative to the current package, and will only find things defined in |
|
the current file." |
|
|
|
Unlike Go, we do not allow finding things in other packages without being |
|
fully-qualified; this mostly comes from doing source-diving in very large |
|
packages, like the Go runtime, where it is very hard to find where something is |
|
defined. |
|
|
|
### Unique Enum Values |
|
|
|
Right now, we allow aliases in enums: |
|
|
|
``` |
|
enum Foo { |
|
BAR = 5; |
|
BAZ = 5; |
|
} |
|
``` |
|
|
|
This results in significant complexity in some parts of the backend, and weird |
|
behavior in textproto and JSON. We should disallow this. |
|
|
|
We would introduce a feature like `features.allow_enum_aliases`, which would |
|
switch from true to false. |
|
|
|
### Imports are Used |
|
|
|
We should adopt the Go rule that all non-public imports are used (i.e, every |
|
import provides at least one type referred to in the file). |
|
|
|
We would introduce a feature like `features.allow_unused_imports`, which would |
|
switch from true to false. |
|
|
|
### Next Field # is Reserved |
|
|
|
There's a few idioms for this checked by linters, such as `// Next ID: N`. We |
|
should codify this in the language by rewriting that every message begin with |
|
`reserved N to max;`, with the intent that `N` is the next never-used field |
|
number. Because it is required to be the first production in the message, it can |
|
be |
|
|
|
We could, additionally, require that *every* field number be either used or |
|
reserved, in addition to having a single `N to max;` reservation. Alternatively, |
|
we could require that every field number up to the largest one used be reserved; |
|
gaps between message numbers are usually a smell. |
|
|
|
This applies equally to message fields and enum values. |
|
|
|
We would introduce a feature like `features.allow_unused_numbers`, which we |
|
would switch from true to false. |
|
|
|
### Disallow Implicit String Concatenation |
|
|
|
Protobuf will implicitly concatenate two adjacent strings in any place it allows |
|
quoted strings, e.g. `option foo = "bar " "baz;`. This has caused interesting |
|
problems around `reserved` in the past, if a comma is omitted: `reserved "foo" |
|
"bar";` is `reserved "foobar";`. |
|
|
|
We would introduce a feature like `features.concatenate_adjacent_strings`, which |
|
would switch from true to false. |
|
|
|
### Package Is First |
|
|
|
The `package` declaration can appear anywhere in the file after `syntax` or |
|
`edition`. We should take cues from Go and require it to be the first thing in |
|
the file, after the edition. |
|
|
|
We would introduce a feature like `features.package_anywhere`, which would |
|
switch from true to false. |
|
|
|
### Strict Boolean Options |
|
|
|
Boolean options can use true, false, True, False, T, or F as a value: `option |
|
my_bool = T;`. We should restrict to only `true` and `false`. |
|
|
|
We would introduce a feature like `features.loose_bool_options`, which would |
|
switch from true to false. |
|
|
|
### Decimal Field Numbers |
|
|
|
We permit non-decimal integer literals for field numbers, e.g. `optional int32 |
|
x = 0x01;`. Thankfully(?) we do not already permit a leading + or -. We should |
|
require decimal literals, since there is very little reason to allow other |
|
literals and makes the Protobuf language harder to parse. |
|
|
|
We would introduce a feature like `features.non_decimal_field_numbers`, which |
|
would switch from true to false. |