Protocol Buffers - Google's data interchange format (grpc依赖) https://developers.google.com/protocol-buffers/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

168 lines
6.4 KiB

# upb Design
[TOC]
upb is a protobuf kernel written in C. It is a fast and conformant implementation
of protobuf, with a low-level C API that is designed to be wrapped in other
languages.
upb is not designed to be used by applications directly. The C API is very
low-level, unsafe, and changes frequently. It is important that upb is able to
make breaking API changes as necessary, to avoid taking on technical debt that
would compromise upb's goals of small code size and fast performance.
## Design goals
Goals:
- Full protobuf conformance
- Small code size
- Fast performance (without compromising code size)
- Easy to wrap in language runtimes
- Easy to adapt to different memory management schemes (refcounting, GC, etc)
Non-Goals:
- Stable API
- Safe API
- Ergonomic API for applications
Parameters:
- C99
- 32 or 64-bit CPU (assumes 4 or 8 byte pointers)
- Uses pointer tagging, but avoids other implementation-defined behavior
- Aims to never invoke undefined behavior (tests with ASAN, UBSAN, etc)
- No global state, fully re-entrant
## Arenas
All memory management in upb uses arenas, using the type `upb_Arena`. Arenas
are an alternative to `malloc()` and `free()` that significantly reduces the
costs of memory allocation.
Arenas obtain blocks of memory using some underlying allocator (likely
`malloc()` and `free()`), and satisfy allocations using a simple bump allocator
that walks through each block in linear order. Allocations cannot be freed
individually: it is only possible to free the arena as a whole, which frees all
of the underlying blocks.
Here is an example of using the `upb_Arena` type:
```c
upb_Arena* arena = upb_Arena_New();
// Perform some allocations.
int* x = upb_Arena_Malloc(arena, sizeof(*x));
int* y = upb_Arena_Malloc(arena, sizeof(*y));
// We cannot free `x` and `y` separately, we can only free the arena
// as a whole.
upb_Arena_Free(arena);
```
upb uses arenas for all memory management, and this fact is reflected in the API
for all upb data structures. All upb functions that allocate take a
`upb_Arena*` parameter and perform allocations using that arena rather than
calling `malloc()` or `free()`.
```c
// upb API to create a message.
UPB_API upb_Message* upb_Message_New(const upb_MiniTable* mini_table,
upb_Arena* arena);
void MakeMessage(const upb_MiniTable* mini_table) {
upb_Arena* arena = upb_Arena_New();
// This message is allocated on our arena.
upb_Message* msg = upb_Message_New(mini_table, arena);
// We can free the arena whenever we want, but we cannot free the
// message separately from the arena.
upb_Arena_Free(arena);
// msg is now deleted.
}
```
Arenas are a key part of upb's performance story. Parsing a large protobuf
payload usually involves rapidly creating a series of messages, arrays (repeated
fields), and maps. It is crucial for parsing performance that these allocations
are as fast as possible. Equally important, freeing the tree of messages should
be as fast as possible, and arenas can reduce this cost from `O(n)` to `O(lg
n)`.
### Avoiding Dangling Pointers
Objects allocated on an arena will frequently contain pointers to other
arena-allocated objects. For example, a `upb_Message` will have pointers to
sub-messages that are also arena-allocated.
Unlike unique ownership schemes (such as `unique_ptr<>`), arenas cannot provide
automatic safety from dangling pointers. Instead, upb provides tools to help
bridge between higher-level memory management schemes (GC, refcounting, RAII,
borrow checkers) and arenas.
If there is only one arena, dangling pointers within the arena are impossible,
because all objects are freed at the same time. This is the simplest case. The
user must still be careful not to keep dangling pointers that point at arena
memory after it has been freed, but dangling pointers *between* the arena
objects will be impossible.
But what if there are multiple arenas? If we have a pointer from one arena to
another, how do we ensure that this will not become a dangling pointer?
To help with the multiple arena case, upb provides a primitive called "fuse".
```c
// Fuses the lifetimes of `a` and `b`. None of the blocks from `a` or `b`
// will be freed until both arenas are freed.
UPB_API bool upb_Arena_Fuse(upb_Arena* a, upb_Arena* b);
```
When two arenas are fused together, their lifetimes are irreversibly joined,
such that none of the arena blocks in either arena will be freed until *both*
arenas are freed with `upb_Arena_Free()`. This means that dangling pointers
between the two arenas will no longer be possible.
Fuse is useful when joining two messages from separate arenas (making one a
sub-message of the other). Fuse is a relatively cheap operation, on the order
of 150ns, and is very nearly `O(1)` in the number of arenas being fused (the
true complexity is the inverse Ackermann function, which grows extremely
slowly).
Each arena does consume some memory, so repeatedly creating and fusing an
additional arena is not free, but the CPU cost of fusing two arenas together is
modest.
### Initial Block and Custom Allocators
`upb_Arena` normally uses `malloc()` and `free()` to allocate and return its
underlying blocks. But this default strategy can be customized to support
the needs of a particular language.
The lowest-level function for creating a `upb_Arena` is:
```c
// Creates an arena from the given initial block (if any -- n may be 0).
// Additional blocks will be allocated from |alloc|. If |alloc| is NULL,
// this is a fixed-size arena and cannot grow.
UPB_API upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc);
```
The buffer `[mem, n]` will be used as an "initial block", which is used to
satisfy allocations before calling any underlying allocation function. Note
that the `upb_Arena` itself will be allocated from the initial block if
possible, so the amount of memory available for allocation from the arena will
be less than `n`.
The `alloc` parameter specifies a custom memory allocation function which
will be used once the initial block is exhausted. The user can pass `NULL`
as the allocation function, in which case the initial block is the only memory
available in the arena. This can allow upb to be used even in situations where
there is no heap.
It follows that `upb_Arena_Malloc()` is a fallible operation, and all allocating
operations like `upb_Message_New()` should be checked for failure if there is
any possibility that a fixed size arena is in use.