Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
168 lines
6.4 KiB
168 lines
6.4 KiB
1 year ago
|
# upb Design
|
||
|
|
||
|
[TOC]
|
||
|
|
||
|
upb is a protobuf kernel written in C. It is a fast and conformant implementation
|
||
|
of protobuf, with a low-level C API that is designed to be wrapped in other
|
||
|
languages.
|
||
|
|
||
|
upb is not designed to be used by applications directly. The C API is very
|
||
|
low-level, unsafe, and changes frequently. It is important that upb is able to
|
||
|
make breaking API changes as necessary, to avoid taking on technical debt that
|
||
|
would compromise upb's goals of small code size and fast performance.
|
||
|
|
||
|
## Design goals
|
||
|
|
||
|
Goals:
|
||
|
|
||
|
- Full protobuf conformance
|
||
|
- Small code size
|
||
|
- Fast performance (without compromising code size)
|
||
|
- Easy to wrap in language runtimes
|
||
|
- Easy to adapt to different memory management schemes (refcounting, GC, etc)
|
||
|
|
||
|
Non-Goals:
|
||
|
|
||
|
- Stable API
|
||
|
- Safe API
|
||
|
- Ergonomic API for applications
|
||
|
|
||
|
Parameters:
|
||
|
|
||
|
- C99
|
||
|
- 32 or 64-bit CPU (assumes 4 or 8 byte pointers)
|
||
|
- Uses pointer tagging, but avoids other implementation-defined behavior
|
||
|
- Aims to never invoke undefined behavior (tests with ASAN, UBSAN, etc)
|
||
|
- No global state, fully re-entrant
|
||
|
|
||
|
## Arenas
|
||
|
|
||
|
All memory management in upb uses arenas, using the type `upb_Arena`. Arenas
|
||
|
are an alternative to `malloc()` and `free()` that significantly reduces the
|
||
|
costs of memory allocation.
|
||
|
|
||
|
Arenas obtain blocks of memory using some underlying allocator (likely
|
||
|
`malloc()` and `free()`), and satisfy allocations using a simple bump allocator
|
||
|
that walks through each block in linear order. Allocations cannot be freed
|
||
|
individually: it is only possible to free the arena as a whole, which frees all
|
||
|
of the underlying blocks.
|
||
|
|
||
|
Here is an example of using the `upb_Arena` type:
|
||
|
|
||
|
```c
|
||
|
upb_Arena* arena = upb_Arena_New();
|
||
|
|
||
|
// Perform some allocations.
|
||
|
int* x = upb_Arena_Malloc(arena, sizeof(*x));
|
||
|
int* y = upb_Arena_Malloc(arena, sizeof(*y));
|
||
|
|
||
|
// We cannot free `x` and `y` separately, we can only free the arena
|
||
|
// as a whole.
|
||
|
upb_Arena_Free(arena);
|
||
|
```
|
||
|
|
||
|
upb uses arenas for all memory management, and this fact is reflected in the API
|
||
|
for all upb data structures. All upb functions that allocate take a
|
||
|
`upb_Arena*` parameter and perform allocations using that arena rather than
|
||
|
calling `malloc()` or `free()`.
|
||
|
|
||
|
```c
|
||
|
// upb API to create a message.
|
||
|
UPB_API upb_Message* upb_Message_New(const upb_MiniTable* mini_table,
|
||
|
upb_Arena* arena);
|
||
|
|
||
|
void MakeMessage(const upb_MiniTable* mini_table) {
|
||
|
upb_Arena* arena = upb_Arena_New();
|
||
|
|
||
|
// This message is allocated on our arena.
|
||
|
upb_Message* msg = upb_Message_New(mini_table, arena);
|
||
|
|
||
|
// We can free the arena whenever we want, but we cannot free the
|
||
|
// message separately from the arena.
|
||
|
upb_Arena_Free(arena);
|
||
|
|
||
|
// msg is now deleted.
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Arenas are a key part of upb's performance story. Parsing a large protobuf
|
||
|
payload usually involves rapidly creating a series of messages, arrays (repeated
|
||
|
fields), and maps. It is crucial for parsing performance that these allocations
|
||
|
are as fast as possible. Equally important, freeing the tree of messages should
|
||
|
be as fast as possible, and arenas can reduce this cost from `O(n)` to `O(lg
|
||
|
n)`.
|
||
|
|
||
|
### Avoiding Dangling Pointers
|
||
|
|
||
|
Objects allocated on an arena will frequently contain pointers to other
|
||
|
arena-allocated objects. For example, a `upb_Message` will have pointers to
|
||
|
sub-messages that are also arena-allocated.
|
||
|
|
||
|
Unlike unique ownership schemes (such as `unique_ptr<>`), arenas cannot provide
|
||
|
automatic safety from dangling pointers. Instead, upb provides tools to help
|
||
|
bridge between higher-level memory management schemes (GC, refcounting, RAII,
|
||
|
borrow checkers) and arenas.
|
||
|
|
||
|
If there is only one arena, dangling pointers within the arena are impossible,
|
||
|
because all objects are freed at the same time. This is the simplest case. The
|
||
|
user must still be careful not to keep dangling pointers that point at arena
|
||
|
memory after it has been freed, but dangling pointers *between* the arena
|
||
|
objects will be impossible.
|
||
|
|
||
|
But what if there are multiple arenas? If we have a pointer from one arena to
|
||
|
another, how do we ensure that this will not become a dangling pointer?
|
||
|
|
||
|
To help with the multiple arena case, upb provides a primitive called "fuse".
|
||
|
|
||
|
```c
|
||
|
// Fuses the lifetimes of `a` and `b`. None of the blocks from `a` or `b`
|
||
|
// will be freed until both arenas are freed.
|
||
|
UPB_API bool upb_Arena_Fuse(upb_Arena* a, upb_Arena* b);
|
||
|
```
|
||
|
|
||
|
When two arenas are fused together, their lifetimes are irreversibly joined,
|
||
|
such that none of the arena blocks in either arena will be freed until *both*
|
||
|
arenas are freed with `upb_Arena_Free()`. This means that dangling pointers
|
||
|
between the two arenas will no longer be possible.
|
||
|
|
||
|
Fuse is useful when joining two messages from separate arenas (making one a
|
||
|
sub-message of the other). Fuse is a relatively cheap operation, on the order
|
||
|
of 150ns, and is very nearly `O(1)` in the number of arenas being fused (the
|
||
|
true complexity is the inverse Ackermann function, which grows extremely
|
||
|
slowly).
|
||
|
|
||
|
Each arena does consume some memory, so repeatedly creating and fusing an
|
||
|
additional arena is not free, but the CPU cost of fusing two arenas together is
|
||
|
modest.
|
||
|
|
||
|
### Initial Block and Custom Allocators
|
||
|
|
||
|
`upb_Arena` normally uses `malloc()` and `free()` to allocate and return its
|
||
|
underlying blocks. But this default strategy can be customized to support
|
||
|
the needs of a particular language.
|
||
|
|
||
|
The lowest-level function for creating a `upb_Arena` is:
|
||
|
|
||
|
```c
|
||
|
// Creates an arena from the given initial block (if any -- n may be 0).
|
||
|
// Additional blocks will be allocated from |alloc|. If |alloc| is NULL,
|
||
|
// this is a fixed-size arena and cannot grow.
|
||
|
UPB_API upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc);
|
||
|
```
|
||
|
|
||
|
The buffer `[mem, n]` will be used as an "initial block", which is used to
|
||
|
satisfy allocations before calling any underlying allocation function. Note
|
||
|
that the `upb_Arena` itself will be allocated from the initial block if
|
||
|
possible, so the amount of memory available for allocation from the arena will
|
||
|
be less than `n`.
|
||
|
|
||
|
The `alloc` parameter specifies a custom memory allocation function which
|
||
|
will be used once the initial block is exhausted. The user can pass `NULL`
|
||
|
as the allocation function, in which case the initial block is the only memory
|
||
|
available in the arena. This can allow upb to be used even in situations where
|
||
|
there is no heap.
|
||
|
|
||
|
It follows that `upb_Arena_Malloc()` is a fallible operation, and all allocating
|
||
|
operations like `upb_Message_New()` should be checked for failure if there is
|
||
|
any possibility that a fixed size arena is in use.
|