Adds "Editions: Feature Extension Layout" to the GitHub code repository.

PiperOrigin-RevId: 578819110
1 year ago · 725b477032
parent 5f146f8dfe
commit 725b477032
2 changed files with 151 additions and 0 deletions
--- a/docs/design/editions/README.md
+++ b/docs/design/editions/README.md
@ -36,3 +36,4 @@ The following topics are in this repository:
 *   [Edition Naming](edition-naming.md)
 *   [Editions Feature Visibility](editions-feature-visibility.md)
 *   [Legacy Syntax Editions](legacy-syntax-editions.md)
+*   [Editions: Feature Extension Layout](editions-feature-extension-layout.md)
--- a/docs/design/editions/editions-feature-extension-layout.md
+++ b/docs/design/editions/editions-feature-extension-layout.md
@ -0,0 +1,150 @@
+# Editions: Feature Extension Layout
+
+**Author:** [@mkruskal-google](https://github.com/mkruskal-google),
+[@zhangskz](https://github.com/zhangskz)
+
+**Approved:** 2023-08-23
+
+## Background
+
+"[What are Protobuf Editions](what-are-protobuf-editions.md)" lays out a plan
+for allowing for more targeted features not owned by the protobuf team. It uses
+extensions of the global features proto to implement this. One thing that was
+left a bit ambiguous was *who* should own these extensions. Language, code
+generator, and runtime implementations are all similar but not identical
+distinctions.
+
+"Editions Zero Feature: utf8_validation" (not available externally, though a
+later version,
+"[Editions Zero: utf8_validation Without Problematic Options](editions-zero-utf8_validation.md)"
+is) is a recent plan to add a new set of generator features for utf8 validation.
+While the sole feature we had originally created (`legacy_closed_enum` in Java
+and C++) didn't have any ambiguity here, this one did. Specifically in Python,
+the current behaviors across proto2/proto3 are distinct for all 3
+implementations: pure python, Python/C++, Python/upb.
+
+## Overview
+
+In meetings, we've discussed various alternatives, captured below. The original
+plan was to make feature extensions runtime implementation-specific (e.g. C++,
+Java, Python, upb). There are some notable complications that came up though:
+
+1.  **Polyglot** - it's not clear how upb or C++ runtimes should behave in
+    multi-language situations. Which feature sets do they consider for runtime
+    behaviors? *Note: this is already a serious issue today, where all proto2
+    strings and many proto3 strings are completely unsafe across languages.*
+
+2.  **Shared Implementations** - Runtimes like upb and C++ are used as backing
+    implementations of multiple other languages (e.g. Python, Rust, Ruby, PHP).
+    If we have a single set of `upb` or `cpp` features, migrating to those
+    shared implementations would be more difficult (since there's no independent
+    switches per-language). *Note: this is already the situation we're in today,
+    where switching the runtime implementation can cause subtle and dangerous
+    behavior changes.*
+
+Given that we only have two behaviors, and one of them is unambiguous, it seems
+reasonable to punt on this decision until we have more information. We may
+encounter more edge cases that require feature extensions (and give us more
+information) during the rollout of edition zero. We also have a lot of freedom
+to re-model features in later editions, so keeping the initial implementation as
+simple as possible seems best (i.e. Alternative 2).
+
+## Alternatives
+
+### Alternative 1: Runtime Implementation Features
+
+Features would be per-runtime implementation as originally described in
+"Editions Zero Feature: utf8_validation." For example, Protobuf Python users
+would set different features depending on the backing implementation (e.g.
+`features.(pb.cpp).<feature>`, `features.(pb.upb).<feature>`).
+
+#### Pros
+
+*   Most consistent with range of behaviors expressible pre-Editions
+
+#### Cons
+
+*   Implementation may / should not be obvious to users.
+*   Lack of levers specifically for language / implementation combos. For
+    example, there is no way to set Python-C++ behavior independently of C++
+    behavior which may make migration harder from other Python implementations.
+
+### Alternative 2: Generator Features
+
+Features would be per-generator only (i.e. each protoc plugin would own one set
+of features). This was the second decision we made in later discussions, and
+while very similar to the above alternative, it's more inline with our goal of
+making features primarily for codegen.
+
+For example, all Python implementations would share the same set of features
+(e.g. `features.(pb.python).<feature>`). However, certain features could be
+targeted to specific implementations (e.g.
+`features.(pb.python).upb_utf8_validation` would only be used by Python/upb).
+
+#### Pros
+
+*   Allows independent controls of shared implementations in different target
+    languages (e.g. Python's upb feature won't affect PHP).
+
+#### Cons
+
+*   Possible complexity in upb to understand which language's features to
+    respect. UPB is not currently aware of what language it is being used for.
+*   Limits in-process sharing across languages with shared implementations (e.g.
+    Python upb, PHP upb) in the case of conflicting behaviors.
+    *   Additional checks may be needed.
+
+### Alternative 3: Migrate to bytes
+
+Since this whole discussion revolves around the utf8 validation feature, one
+option would be to just remove it from edition zero. Instead of adding a new
+toggle for UTF8 behavior, we could simply migrate everyone who doesn't enforce
+utf8 today to `bytes`. This would likely need another new *codegen* feature for
+generating byte getters/setters as strings, but that wouldn't have any of the
+ambiguity we're seeing today.
+
+Unfortunately, this doesn't seem feasible because of all the different behaviors
+laid out in "Editions Zero Feature: utf8_validation." UTF8 validation isn't
+really a binary on/off decision, and it can vary widely between languages. There
+are many cases where UTF8 is validated in **some** languages but not others, and
+there's also the C++ "hint" behavior that logs errors but allows invalid UTF8.
+
+**Note:** This could still be partially done in a follow-up LSC by targeting
+specific combinations of the new feature that disable validation in all relevant
+languages.
+
+#### Pros
+
+*   Punts on the issue, we wouldn't need any upb features and C++ features would
+    all be code-gen only
+*   Simplifies the situation, avoids adding a very complicated feature in
+    edition zero
+
+#### Cons
+
+*   Not really possible given the current complexity
+*   There are O(10M) proto2 string fields that would be blindly changed to bytes
+
+### Alternative 4: Nested Features
+
+Another option is to allow for shared feature set messages. For example, upb
+would define a feature message, but *not* make it an extension of the global
+`FeatureSet`. Instead, languages with upb implementations would have a field of
+this type to allow for finer-grained controls. C++ would both extend the global
+`FeatureSet` and also be allowed as a field in other languages.
+
+For example, python utf8 validation could be specified as:
+
+We could have checks during feature validation that enforce that impossible
+combinations aren't specified. For example, with our current implementation
+`features.(pb.python).cpp` should always be identical to `features.(pb.cpp)`,
+since we don't have any mechanism for distinguishing them.
+
+#### Pros
+
+*   Much more explicit than options 1 and 2
+
+#### Cons
+
+*   Maybe too explicit? Proto owners would be forced to duplicate a lot of
+    features