You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
255 lines
13 KiB
255 lines
13 KiB
2 years ago
|
|
||
|
# upb vs. C++ Protobuf Design
|
||
|
|
||
|
[upb](https://github.com/protocolbuffers/upb) is a small C protobuf library.
|
||
|
While some of the design follows in the footsteps of the C++ Protobuf Library,
|
||
|
upb departs from C++'s design in several key ways. This document compares
|
||
|
and contrasts the two libraries on several design points.
|
||
|
|
||
|
## Design Goals
|
||
|
|
||
|
Before we begin, it is worth calling out that upb and C++ have different design
|
||
|
goals, and this motivates some of the differences we will see.
|
||
|
|
||
|
C++ protobuf is a user-level library: it is designed to be used directly by C++
|
||
|
applications. These applications will expect a full-featured C++ API surface
|
||
|
that uses C++ idioms. The C++ library is also willing to add features to
|
||
|
increase server performance, even if these features would add size or complexity
|
||
|
to the library. Because C++ protobuf is a user-level library, API stability is
|
||
|
of utmost importance: breaking API changes are rare and carefully managed when
|
||
|
they do occur. The focus on C++ also means that ABI compatibility with C is not
|
||
|
a priority.
|
||
|
|
||
|
upb, on the other hand, is designed primarily to be wrapped by other languages.
|
||
|
It is a C protobuf kernel that forms the basis on which a user-level protobuf
|
||
|
library can be built. This means we prefer to keep the API surface as small and
|
||
|
orthogonal as possible. While upb supports all protobuf features required for
|
||
|
full conformance, upb prioritizes simplicity and small code size, and avoids
|
||
|
adding features like lazy fields that can accelerate some use cases but at great
|
||
|
cost in terms of complexity. As upb is not aimed directly at users, there is
|
||
|
much more freedom to make API-breaking changes when necessary, which helps the
|
||
|
core to stay small and simple. We want to be compatible with all FFI
|
||
|
interfaces, so C ABI compatibility is a must.
|
||
|
|
||
|
Despite these differences, C++ protos and upb offer [roughly the same core set
|
||
|
of features](https://github.com/protocolbuffers/upb#features).
|
||
|
|
||
|
## Arenas
|
||
|
|
||
|
upb and C++ protos both offer arena allocation, but there are some key
|
||
|
differences.
|
||
|
|
||
|
### C++
|
||
|
|
||
|
As a matter of history, when C++ protos were open-sourced in 2008, they did not
|
||
|
support arenas. Originally there was only unique ownership, whereby each
|
||
|
message uniquely owns all child messages and will free them when the parent is
|
||
|
freed.
|
||
|
|
||
|
Arena allocation was added as a feature in 2014 as a way of dramatically
|
||
|
reducing allocation and (especially) deallocation costs. But the library was
|
||
|
not at liberty to remove the unique ownership model, because it would break far
|
||
|
too many users. As a result, C++ has supported a **hybrid allocation model**
|
||
|
ever since, allowing users to allocate messages either directly from the
|
||
|
stack/heap or from an arena. The library attempts to ensure that there are
|
||
|
no dangling pointers by performing automatic copies in some cases (for example
|
||
|
`a->set_allocated_b(b)`, where `a` and `b` are on different arenas).
|
||
|
|
||
|
C++'s arena object itself `google::protobuf::Arena` is **thread-safe** by
|
||
|
design, which allows users to allocate from multiple threads simultaneously
|
||
|
without external synchronization. The user can supply an initial block of
|
||
|
memory to the arena, and can choose some parameters to control the arena block
|
||
|
size. The user can also supply block alloc/dealloc functions, but the alloc
|
||
|
function is expected to always return some memory. The C++ library in general
|
||
|
does not attempt to handle out of memory conditions.
|
||
|
|
||
|
### upb
|
||
|
|
||
|
upb uses **arena allocation exclusively**. All messages must be allocated from
|
||
|
an arena, and can only be freed by freeing the arena. It is entirely the user's
|
||
|
responsibility to ensure that there are no dangling pointers: when a user sets a
|
||
|
message field, this will always trivially overwrite the pointer and will never
|
||
|
perform an implicit copy.
|
||
|
|
||
|
upb's `upb::Arena` is **thread-compatible**, which means it cannot be used
|
||
|
concurrently without synchronization. The arena can be seeded with an initial
|
||
|
block of memory, but it does not explicitly support any parameters for choosing
|
||
|
block size. It support a custom alloc/dealloc function, and this function is
|
||
|
allowed to return `NULL` if no dynamic memory is available. This allows upb
|
||
|
arenas to have a max/fixed size, and makes it possible in theory to write code
|
||
|
that is tolerant to out-of-memory errors.
|
||
|
|
||
|
upb's arena also supports a novel operation known as **fuse**, which joins two
|
||
|
arenas together into a single lifetime. Though both arenas must still be freed
|
||
|
separately, none of the memory will actually be freed until *both* arenas have
|
||
|
been freed. This is useful for avoiding dangling pointers when reparenting a
|
||
|
message with one that may be on a different arena.
|
||
|
|
||
|
### Comparison
|
||
|
|
||
|
**hybrid allocation vs. arena-only**:
|
||
|
* The C++ hybrid allocation model introduces a great deal of complexity and
|
||
|
unpredictability into the library. upb benefits from having a much simpler
|
||
|
and more predictable design.
|
||
|
* Some of the complexity in C++'s hybrid model arises from the fact that arenas
|
||
|
were added after the fact. Designing for a hybrid model from the outset
|
||
|
would likely yield a simpler result.
|
||
|
* Unique ownership does support some usage patterns that arenas cannot directly
|
||
|
accommodate. For example, you can reparent a message and the child will precisely
|
||
|
follow the lifetime of its new parent. An arena would require you to either
|
||
|
perform a deep copy or extend the lifetime.
|
||
|
|
||
|
**thread-compatible vs. thread-safe arena**
|
||
|
* A thread-safe arena (as in C++) is safer and easier to use. A thread-compatible
|
||
|
arena requires that the user prove that the arena cannot be used concurrently.
|
||
|
* [Thread Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
|
||
|
is far more accessible than it was in 2014 (when C++ introduced a thread-safe
|
||
|
arena). We now have more tools at our disposal to ensure that we do not trigger
|
||
|
data races in a thread-compatible arena like upb.
|
||
|
* Thread-compatible arenas are more performant.
|
||
|
* Thread-compatible arenas have a far simpler implementation. The C++ thread-safe
|
||
|
arena relies on thread-local variables, which introduce complications on some
|
||
|
platforms. It also requires far more subtle reasoning for correctness and
|
||
|
performance.
|
||
|
|
||
|
**fuse vs. no fuse**
|
||
|
* The `upb_Arena_Fuse()` operation is a key part of how upb supports reparenting
|
||
|
of messages when the parent may be on a different arena. Without this, upb has
|
||
|
no way of supporting `foo.bar = bar` in dynamic languages without performing a
|
||
|
deep copy.
|
||
|
* A downside of `upb_Arena_Fuse()` is that passing an arena to a function can allow
|
||
|
that function to extend the lifetime of the arena in potentially
|
||
|
unpredictable ways. This can be prevented if necessary, as fuse can fail, eg. if
|
||
|
one arena has an initial block. But this adds some complexity by requiring callers
|
||
|
to handle the case where fuse fails.
|
||
|
|
||
|
## Code Generation vs. Tables
|
||
|
|
||
|
The C++ protobuf library has always been built around code generation, while upb
|
||
|
generates only tables. In other words, `foo.pb.cc` files contain functions,
|
||
|
whereas `foo.upb.c` files emit only data structures.
|
||
|
|
||
|
### C++
|
||
|
|
||
|
C++ generated code emits a large number of functions into `foo.pb.cc` files.
|
||
|
An incomplete list:
|
||
|
|
||
|
* `FooMsg::FooMsg()` (constructor): initializes all fields to their default value.
|
||
|
* `FooMsg::~FooMsg()` (destructor): frees any present child messages.
|
||
|
* `FooMsg::Clear()`: clears all fields back to their default/empty value.
|
||
|
* `FooMsg::_InternalParse()`: generated code for parsing a message.
|
||
|
* `FooMsg::_InternalSerialize()`: generated code for serializing a message.
|
||
|
* `FooMsg::ByteSizeLong()`: calculates serialized size, as a first pass before serializing.
|
||
|
* `FooMsg::MergeFrom()`: copies/appends present fields from another message.
|
||
|
* `FooMsg::IsInitialized()`: checks whether required fields are set.
|
||
|
|
||
|
This code lives in the `.text` section and contains function calls to the generated
|
||
|
classes for child messages.
|
||
|
|
||
|
### upb
|
||
|
|
||
|
upb does not generate any code into `foo.upb.c` files, only data structures. upb uses a
|
||
|
compact data table known as a *mini table* to represent the schema and all fields.
|
||
|
|
||
|
upb uses mini tables to perform all of the operations that would traditionally be done
|
||
|
with generated code. Revisiting the list from the previous section:
|
||
|
|
||
|
* `FooMsg::FooMsg()` (constructor): upb instead initializes all messages with `memset(msg, 0, size)`.
|
||
|
Non-zero defaults are injected in the accessors.
|
||
|
* `FooMsg::~FooMsg()` (destructor): upb messages are freed by freeing the arena.
|
||
|
* `FooMsg::Clear()`: can be performed with `memset(msg, 0, size)`.
|
||
|
* `FooMsg::_InternalParse()`: upb's parser uses mini tables as data, instead of generating code.
|
||
|
* `FooMsg::_InternalSerialize()`: upb's serializer also uses mini-tables instead of generated code.
|
||
|
* `FooMsg::ByteSizeLong()`: upb performs serialization in reverse so that an initial pass is not required.
|
||
|
* `FooMsg::MergeFrom()`: upb supports this via serialize+parse from the other message.
|
||
|
* `FooMsg::IsInitialized()`: upb's encoder and decoder have special flags to check for required fields.
|
||
|
A util library `upb/util/required_fields.h` handles the corner cases.
|
||
|
|
||
|
### Comparison
|
||
|
|
||
|
If we compare compiled code size, upb is far smaller. Here is a comparison of the code
|
||
|
size of a trivial binary that does nothing but a parse and serialize of `descriptor.proto`.
|
||
|
This means we are seeing both the overhead of the core library itself as well as the
|
||
|
generated code (or table) for `descriptor.proto`. (For extra clarity we should break this
|
||
|
down by generated code vs core library in the future).
|
||
|
|
||
|
|
||
|
| Library | `.text` | `.data` | `.bss` |
|
||
|
|------------ |---------|---------|--------|
|
||
|
| upb | 26Ki | 0.6Ki | 0.01Ki |
|
||
|
| C++ (lite) | 187Ki | 2.8Ki | 1.25Ki |
|
||
|
| C++ (code size) | 904Ki | 6.1Ki | 1.88Ki |
|
||
|
| C++ (full) | 983Ki | 6.1Ki | 1.88Ki |
|
||
|
|
||
|
"C++ (code size)" refers to protos compiled with `optimize_for = CODE_SIZE`, a mode
|
||
|
in which generated code contains reflection only, in an attempt to make the
|
||
|
generated code size smaller (however it requires the full runtime instead
|
||
|
of the lite runtime).
|
||
|
|
||
|
## Bifurcated vs. Optional Reflection
|
||
|
|
||
|
upb and C++ protos both offer reflection without making it mandatory. However
|
||
|
the models for enabling/disabling reflection are very different.
|
||
|
|
||
|
### C++
|
||
|
|
||
|
C++ messages offer full reflection by default. Messages in C++ generally
|
||
|
derive from `Message`, and the base class provides a member function
|
||
|
`Reflection* Message::GetReflection()` which returns the reflection object.
|
||
|
|
||
|
It follows that any message deriving from `Message` will always have reflection
|
||
|
linked into the binary, whether or not the reflection object is ever used.
|
||
|
Because `GetReflection()` is a function on the base class, it is not possible
|
||
|
to statically determine if a given message's reflection is used:
|
||
|
|
||
|
```c++
|
||
|
Reflection* GetReflection(const Message& message) {
|
||
|
// Can refer to any message in the whole binary.
|
||
|
return message.GetReflection();
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The C++ library does provide a way of omitting reflection: `MessageLite`. We can
|
||
|
cause a message to be lite in two different ways:
|
||
|
|
||
|
* `optimize_for = LITE_RUNTIME` in a `.proto` file will cause all messages in that
|
||
|
file to be lite.
|
||
|
* `lite` as a codegen param: this will force all messages to lite, even if the
|
||
|
`.proto` file does not have `optimize_for = LITE_RUNTIME`.
|
||
|
|
||
|
A lite message will derive from `MessageLite` instead of `Message`. Since
|
||
|
`MessageLite` has no `GetReflection()` function, this means no reflection is
|
||
|
available, so we can avoid taking the code size hit.
|
||
|
|
||
|
### upb
|
||
|
|
||
|
upb does not have the `Message` vs. `MessageLite` bifurcation. There is only one
|
||
|
kind of message type `upb_Message`, which means there is no need to configure in
|
||
|
a `.proto` file which messages will need reflection and which will not.
|
||
|
Every message has the *option* to link in reflection from a separate `foo.upbdefs.o`
|
||
|
file, without needing to change the message itself in any way.
|
||
|
|
||
|
upb does not provide the equivalent of `Message::GetReflection()`: there is no
|
||
|
facility for retrieving the reflection of a message whose type is not known statically.
|
||
|
It would be possible to layer such a facility on top of the upb core, though this
|
||
|
would probably require some kind of code generation.
|
||
|
|
||
|
### Comparison
|
||
|
|
||
|
* Most messages in C++ will not bother to declare themselves as "lite". This means
|
||
|
that many C++ messages will link in reflection even when it is never used, bloating
|
||
|
binaries unnecessarily.
|
||
|
* `optimize_for = LITE_RUNTIME` is difficult to use in practice, because it prevents
|
||
|
any non-lite protos from `import`ing that file.
|
||
|
* Forcing all protos to lite via a codegen parameter (for example, when building for
|
||
|
mobile) is more practical than `optimize_for = LITE_RUNTIME`. But this will break
|
||
|
the compile for any code that tries to upcast to `Message`, or tries to use a
|
||
|
non-lite method.
|
||
|
* The one major advantage of the C++ model is that it can support `msg.DebugString()`
|
||
|
on a type-erased proto. For upb you have to explicitly pass the `upb_MessageDef*`
|
||
|
separately if you want to perform an operation like printing a proto to text format.
|
||
|
|
||
|
## Explicit Registration vs. Globals
|
||
|
|
||
|
TODO
|