diff --git a/docs/wrapping-upb.md b/docs/wrapping-upb.md
index 10a2c6d6d1..68392d5d8f 100644
--- a/docs/wrapping-upb.md
+++ b/docs/wrapping-upb.md
@@ -10,160 +10,232 @@ in Visual Studio Code:
https://marketplace.visualstudio.com/items?itemName=shd101wyy.markdown-preview-enhanced
--->
-# Wrapping upb in other languages
+# Building a protobuf library on upb
-upb is a C kernel that is designed to be wrapped in other languages. This is a
-guide for creating a new protobuf implementation based on upb.
+This is a guide for creating a new protobuf implementation based on upb. It
+starts from the beginning and walks you through the process, highlighting
+some important design choices you will need to make.
-## What you will need
+## Overview
-There are certain things that the language runtime must provide in order to be
-wrapped by upb.
+A protobuf implementation consists of two main pieces:
-1. **Finalizers, Destructors, or Cleaners**: This is one unavoidable
- requirement: the language *must* provide finalizers or destructors of some sort.
- There must be a way of calling a C function when the language GCs or otherwise
- destroys an object. We don't care much whether it is a finalizer, a destructor,
- or a cleaner, as long as it gets called eventually when the object is destroyed.
- Without finalizers, we would have no way of cleaning up upb data and everything
- would leak.
-2. **HashMap with weak values**: This is not an absolute requirement, but in
- languages with automatic memory management, we generally end up wanting a
- hash map with weak values to act as a `upb_msg* -> wrapper` object cache.
- We want the values to be weak (not the keys).
+1. a code generator, run at compile time, to turn `.proto` files into source
+ files in your language (we will call this "zlang", assuming an extension of ".z").
+2. a runtime component, which implements the wire format and provides the data
+ structures for representing protobuf data and metadata.
-## Reflection vs. Direct Access
+
-Each language wrapping upb gets to decide whether it will access messages
-through *reflection* or through *direct access*. This decision has some deep
-implications that will affect the design, features, and performance of your
-library.
+```dot {align="center"}
+digraph {
+ rankdir=LR;
+ newrank=true;
+ node [style="rounded,filled" shape=box]
+ "foo.proto" -> protoc;
+ "foo.proto" [shape=folder];
+ protoc [fillcolor=lightgrey];
+ protoc -> "protoc-gen-zlang";
+ "protoc-gen-zlang" -> "foo.z";
+ "protoc-gen-zlang" [fillcolor=palegreen3];
+ "foo.z" [shape=folder];
+ labelloc="b";
+ label="Compile Time";
+}
+```
+
+
+
+```dot {align="center"}
+digraph {
+ newrank=true;
+ node [style="rounded,filled" shape=box fillcolor=lightgrey]
+ "foo.z" -> "zlang/upb glue (FFI)";
+ "zlang/upb glue (FFI)" -> "upb (C)";
+ "zlang/upb glue (FFI)" [fillcolor=palegreen3];
+ labelloc="b";
+ label="Runtime";
+}
+```
+
+The parts in green are what you will need to implement.
+
+Note that your code generator (`protoc-gen-zlang`) does *not* need to generate
+any C code (eg. `foo.c`). While upb itself is written in C, upb's parsers and
+serializers are fully table-driven, which means there is never any need or even
+benefit to generating C code for each proto. upb is capable of full-speed
+parsing even when schema data is loaded at runtime from strings embedded into
+`foo.z`. This is a key benefit of upb compared with C++ protos, which have
+traditionally relied on generated parsers in `foo.pb.cc` files to achieve full
+parsing speed, and suffered a ~10x speed penalty in the parser when the schema
+data was loaded at runtime.
+
+## Prerequisites
+
+There are a few things that the language runtime must provide in order to wrap
+upb.
+
+1. **FFI**: To wrap upb, your language must be able to call into a C API
+ through a Foreign Function Interface (FFI). Most languages support FFI in
+ some form, either through "native extensions" (in which you write some C
+ code to implement new methods in your language) or through a direct FFI (in
+ which you can call into regular C functions directly from your language
+ using a special library).
+2. **Finalizers, Destructors, or Cleaners**: The runtime must provide
+ finalizers or destructors of some sort. There must be a way of triggering a
+ call to a C function when the language garbage collects or otherwise
+ destroys an object. We don't care much whether it is a finalizer, a
+ destructor, or a cleaner, as long as it gets called eventually when the
+ object is destroyed. upb allocates memory in C space, and a finalizer is our
+ only way of making sure that memory is freed and does not leak.
+3. **HashMap with weak values**: (optional) This is not a strong requirement,
+ but it is sometimes helpful to have a global hashmap with weak values to act
+ as a `upb_msg* -> wrapper` object cache. We want the values to be weak (not
+ the keys). There is some question about whether we want to continue to use
+ this pattern going forward.
+
+## Reflection vs. MiniTables
+
+The first key design decision you will need to make is whether your generated
+code will access message data via reflection or minitables. Generally more
+dynamic languages will want to use reflection and more static languages will
+want to use minitables.
### Reflection
-The simplest option is to load full reflection data into the upb library at
-runtime. You can load reflection data using serialized descriptors, which are a
-stable and widely supported format across all protobuf tooling.
-
-```c
- // A upb_symtab is a dynamic container that we can load reflection data into.
- upb_symtab* symtab = upb_symtab_new();
-
- // We load reflection data via a serialized descriptor. The code generator
- // for your language should embed serialized descriptors into your generated
- // files. For each generated file loaded by your library, you can add the
- // serialized descriptor to the symtab as shown.
- upb_arena *tmp = upb_arena_new();
- google_protobuf_FileDescriptorProto* file =
- google_protobuf_FileDescriptorProto_parse(desc_data, desc_size, tmp);
- if (!file || !upb_symtab_addfile(symtab, file, NULL)) {
- // Handle error.
- }
- upb_arena_free(tmp);
+Reflection-based data access makes the most sense in highly dynamic language
+interpreters, where method dispatch is generally resolved via strings and hash
+table lookups.
+
+In such languages, you can often implement a special method like `__getattr__`
+(Python) or `method_missing` (Ruby) that receives the method name as a string.
+Using upb's reflection, you can look up a field name using the method name,
+thereby using a hash table belonging to upb instead of one provided by the
+language.
+
+```python
+class FooMessage:
+ # Written in Python for illustration, but in practice we will want to
+ # implement this in C for speed.
+ def __getattr__(self, name):
+ field = FooMessage.descriptor.fields_by_name[name]
+ return field.get_value(self)
+```
- // At application exit, we free the symtab.
- upb_symtab_free(symtab);
+Using this design, we only need to attach a single `__getattr__` method to each
+message class, instead of defining a getter/setter for each field. In this way
+we can avoid duplicating hash tables between upb and the language interpreter,
+reducing memory usage.
+
+Reflection-based access requires loading full reflection at runtime. Your
+generated code will need to embed serialized descriptors (ie. a serialized
+message of `descriptor.proto`), which has some amount of size overhead and
+exposes all message/field names to the binary. It also forces a hash table
+lookup in the critical path of field access. If method calls in your language
+already have this overhead, then this is no added burden, but for statically
+dispatched languages it would cause extra overhead.
+
+If we take this path to its logical conclusion, all class creation can be
+performed fully dynamically, using only a binary descriptor as input. The
+"generated code" becomes little more than an embedded descriptor plus a
+library call to load it. Python has recently gone down this path. Generated
+code now looks something like this:
+
+```python
+# main_pb2.py
+from google3.net.proto2.python.internal import builder as _builder
+from google3.net.proto2.python.public import descriptor_pool as _descriptor_pool
+
+DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile("<...>")
+_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
+_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'google3.main_pb2', globals())
```
-The `upb_symtab` will give you full access to all data from the `.proto` file,
-including convenient APIs like looking up a field by name. It will allow you to
-use JSON and text format. The APIs for accessing a message through reflection
-are simple and well-supported. These APIs cleanly encapsulate upb's internal
-implementation details.
+This is all the runtime needs to create all of the classes for messages defined
+in that serialized descriptor. This code has no pretense of readability, but
+a separate `.pyi` stub file provides a fully expanded and readable list of all
+methods a user can expect to be available:
-```c
- upb_symtab* symtab = BuildSymtab();
+```python
+# main_pb2.pyi
+from google3.net.proto2.python.public import descriptor as _descriptor
+from google3.net.proto2.python.public import message as _message
+from typing import ClassVar as _ClassVar, Optional as _Optional
- // Look up a message type in the symtab.
- const upb_msgdef* m = upb_symtab_lookupmsg(symtab, "FooMessage");
+DESCRIPTOR: _descriptor.FileDescriptor
- // Construct a new message of this type, via reflection.
- upb_arena *arena = upb_arena_new();
- upb_msg *msg = upb_msg_new(m, arena);
+class MyMessage(_message.Message):
+ __slots__ = ["my_field"]
+ MY_FIELD_FIELD_NUMBER: _ClassVar[int]
+ my_field: str
+ def __init__(self, my_field: _Optional[str] = ...) -> None: ...
+```
- // Set a message field using reflection.
- const upb_fielddef* f = upb_msgdef_ntof("bar_field");
- upb_msgval val = {.int32_val = 123};
- upb_msg_set(m, f, val, arena);
+To use reflection-based access:
- // Free the message and symtab.
- upb_arena_free(arena);
- upb_symtab_free(symtab);
-```
+1. Load and access descriptor data using the interfaces in google3/third_party/upb/upb/def.h.
+2. Access message data using the interfaces in google3/third_party/upb/upb/reflection.h.
-Using reflection is a natural choice in heavily reflective, dynamic runtimes
-like Python, Ruby, PHP, or Lua. These languages generally perform method
-dispatch through a dictionary/hash table anyway, so we are not adding any extra
-overhead by using upb's hash table to lookup fields by name at field access
-time.
-
-### Direct Access
-
-Using reflection has some downsides. Reflection data is relatively large, both
-in your binary (at rest) and in RAM (at runtime). It contains names of
-everything, and these names will be exposed in your binary. Reflection APIs for
-accessing a message will have more overhead than you might want, especially if
-crossing the FFI boundary for your language runtime imposes significant
-overhead.
-
-We can reduce these overheads by using *direct access*. upb's parser and
-serializer do not actually require full reflection data, they use a more compact
-data structure known as **mini tables**. Mini tables will take up less space
-than reflection, both in the binary and in RAM, and they will not leak field
-names. Mini tables will let us parse and serialize binary wire format data
-without reflection.
-
-```c
- // TODO: demonstrate upb API for loading mini table data at runtime.
- // This API does not exist yet.
-```
+### MiniTables
-To access messages themselves without the reflection API, we will be using
-different, lower-level APIs that will require you to supply precise data such as
-the offset of a given field. This is information that will come from the upb
-compiler framework, and the correctness (and even memory safety!) of the program
-will rely on you passing these values through from the upb compiler libraries to
-the upb runtime correctly.
+MiniTables are a "lite" schema representation that are much smaller that
+reflection. MiniTables omit names, options, and almost everything else from the
+`.proto` file, retaining only enough information to parse and serialize binary
+format.
-```c
- // TODO: demonstrate using low-level APIs for direct field access.
- // These APIs do not exist yet.
-```
+MiniTables can be loaded into upb through *MiniDescriptors*. MiniDescriptors are
+a byte-oriented format that can be embedded into your generated code and passed
+to upb to construct MiniTables. MiniDescriptors only use printable characters,
+and therefore do not require escaping when embedding them into generated code
+strings. Overall the size savings of MiniDescriptors are ~60x compared with
+regular descriptors.
-It can even be possible in certain circumstances to bypass the upb API completely
-and access raw field data directly at a given offset, using unsafe APIs like
-`sun.misc.unsafe`. This can theoretically allow for field access that is no
-more expensive than referencing a struct/class field.
+MiniTables and MiniDescriptors are a natural choice for compiled languages that
+resolve method calls at compile time. For languages that are sometimes compiled
+and sometimes interpreted, there might not be an obvious choice. When a method
+call is statically bound, we want to remove as much overhead as possible,
+especially from accessors. In the extreme case, we can use unsafe APIs to read
+raw memory at a known offset:
```java
-import sun.misc.Unsafe;
+// Example of a maximally-optimized generated accessor.
+class FooMessage {
+ public long getBarField() {
+ // Using Unsafe should give us performance that is comparable to a
+ // native member access.
+ //
+ // The constant "24" is obtained from upb at compile time.
+ sun.misc.Unsafe.getLong(this.ptr, 24);
+ }
+}
+```
-class FooProto {
- private final long addr;
- private final Arena arena;
+This design is very low-level, and tightly couples the generated code to one
+specific version of the schema and compiler. A slower but safer version would
+look up a field by field number:
- // Accessor that a Java library built on upb could conceivably generate.
- long getFoo() {
- // The offset 1234 came from the upb compiler library, and was injected by the
- // Java+upb code generator.
- return Unsafe.getLong(self.addr + 1234);
- }
+```java
+// Example of a more loosely-coupled accessor.
+class FooMessage {
+ public long getBarField() {
+ // The constant "2" is the field number. Internally this will look
+ // up the number "2" in the MiniTable and use that to read the value
+ // from the message.
+ upb.glue.getLong(this.ptr, 2);
+ }
}
```
-It is always possible to load reflection data as desired, even if your library
-is designed primarily around direct access. Users who want to use JSON, text
-format, or reflection could potentially load reflection data from separate
-generated modules, for cases where they do not mind the size overhead or the
-leaking of field names. You do not give up any of these possibilities by using
-direct access.
-
-However, using direct access does have some noticeable downsides. It requires
-tighter coupling with upb's implementation details, as the mini table format is
-upb-specific and requires building your code generator against upb's compiler
-libraries. Any direct access of memory is especially tightly coupled, and would
-need to be changed if upb's in-memory format ever changes. It also is more
-prone to hard-to-debug memory errors if you make any mistakes.
+One downside of MiniTables is that they cannot support parsing or serializing
+to JSON or TextFormat, because they do now know the field names. It should be
+possible to generate reflection data "on the side", into separate generated
+code files, so that reflection is only pulled in if it is being used. However
+APIs to do this do not exist yet.
+
+To use MiniTable-based access:
+
+1. Load and access MiniDescriptors data using the interfaces in google3/third_party/upb/upb/mini_table.h.
+2. Access message data using the interfaces in google3/third_party/upb/upb/mini_table_accessors.h.
## Memory Management