Protocol Buffers - Google's data interchange format (grpc依赖)
https://developers.google.com/protocol-buffers/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
167 lines
6.4 KiB
167 lines
6.4 KiB
# upb Design |
|
|
|
[TOC] |
|
|
|
upb is a protobuf kernel written in C. It is a fast and conformant implementation |
|
of protobuf, with a low-level C API that is designed to be wrapped in other |
|
languages. |
|
|
|
upb is not designed to be used by applications directly. The C API is very |
|
low-level, unsafe, and changes frequently. It is important that upb is able to |
|
make breaking API changes as necessary, to avoid taking on technical debt that |
|
would compromise upb's goals of small code size and fast performance. |
|
|
|
## Design goals |
|
|
|
Goals: |
|
|
|
- Full protobuf conformance |
|
- Small code size |
|
- Fast performance (without compromising code size) |
|
- Easy to wrap in language runtimes |
|
- Easy to adapt to different memory management schemes (refcounting, GC, etc) |
|
|
|
Non-Goals: |
|
|
|
- Stable API |
|
- Safe API |
|
- Ergonomic API for applications |
|
|
|
Parameters: |
|
|
|
- C99 |
|
- 32 or 64-bit CPU (assumes 4 or 8 byte pointers) |
|
- Uses pointer tagging, but avoids other implementation-defined behavior |
|
- Aims to never invoke undefined behavior (tests with ASAN, UBSAN, etc) |
|
- No global state, fully re-entrant |
|
|
|
## Arenas |
|
|
|
All memory management in upb uses arenas, using the type `upb_Arena`. Arenas |
|
are an alternative to `malloc()` and `free()` that significantly reduces the |
|
costs of memory allocation. |
|
|
|
Arenas obtain blocks of memory using some underlying allocator (likely |
|
`malloc()` and `free()`), and satisfy allocations using a simple bump allocator |
|
that walks through each block in linear order. Allocations cannot be freed |
|
individually: it is only possible to free the arena as a whole, which frees all |
|
of the underlying blocks. |
|
|
|
Here is an example of using the `upb_Arena` type: |
|
|
|
```c |
|
upb_Arena* arena = upb_Arena_New(); |
|
|
|
// Perform some allocations. |
|
int* x = upb_Arena_Malloc(arena, sizeof(*x)); |
|
int* y = upb_Arena_Malloc(arena, sizeof(*y)); |
|
|
|
// We cannot free `x` and `y` separately, we can only free the arena |
|
// as a whole. |
|
upb_Arena_Free(arena); |
|
``` |
|
|
|
upb uses arenas for all memory management, and this fact is reflected in the API |
|
for all upb data structures. All upb functions that allocate take a |
|
`upb_Arena*` parameter and perform allocations using that arena rather than |
|
calling `malloc()` or `free()`. |
|
|
|
```c |
|
// upb API to create a message. |
|
UPB_API upb_Message* upb_Message_New(const upb_MiniTable* mini_table, |
|
upb_Arena* arena); |
|
|
|
void MakeMessage(const upb_MiniTable* mini_table) { |
|
upb_Arena* arena = upb_Arena_New(); |
|
|
|
// This message is allocated on our arena. |
|
upb_Message* msg = upb_Message_New(mini_table, arena); |
|
|
|
// We can free the arena whenever we want, but we cannot free the |
|
// message separately from the arena. |
|
upb_Arena_Free(arena); |
|
|
|
// msg is now deleted. |
|
} |
|
``` |
|
|
|
Arenas are a key part of upb's performance story. Parsing a large protobuf |
|
payload usually involves rapidly creating a series of messages, arrays (repeated |
|
fields), and maps. It is crucial for parsing performance that these allocations |
|
are as fast as possible. Equally important, freeing the tree of messages should |
|
be as fast as possible, and arenas can reduce this cost from `O(n)` to `O(lg |
|
n)`. |
|
|
|
### Avoiding Dangling Pointers |
|
|
|
Objects allocated on an arena will frequently contain pointers to other |
|
arena-allocated objects. For example, a `upb_Message` will have pointers to |
|
sub-messages that are also arena-allocated. |
|
|
|
Unlike unique ownership schemes (such as `unique_ptr<>`), arenas cannot provide |
|
automatic safety from dangling pointers. Instead, upb provides tools to help |
|
bridge between higher-level memory management schemes (GC, refcounting, RAII, |
|
borrow checkers) and arenas. |
|
|
|
If there is only one arena, dangling pointers within the arena are impossible, |
|
because all objects are freed at the same time. This is the simplest case. The |
|
user must still be careful not to keep dangling pointers that point at arena |
|
memory after it has been freed, but dangling pointers *between* the arena |
|
objects will be impossible. |
|
|
|
But what if there are multiple arenas? If we have a pointer from one arena to |
|
another, how do we ensure that this will not become a dangling pointer? |
|
|
|
To help with the multiple arena case, upb provides a primitive called "fuse". |
|
|
|
```c |
|
// Fuses the lifetimes of `a` and `b`. None of the blocks from `a` or `b` |
|
// will be freed until both arenas are freed. |
|
UPB_API bool upb_Arena_Fuse(upb_Arena* a, upb_Arena* b); |
|
``` |
|
|
|
When two arenas are fused together, their lifetimes are irreversibly joined, |
|
such that none of the arena blocks in either arena will be freed until *both* |
|
arenas are freed with `upb_Arena_Free()`. This means that dangling pointers |
|
between the two arenas will no longer be possible. |
|
|
|
Fuse is useful when joining two messages from separate arenas (making one a |
|
sub-message of the other). Fuse is a relatively cheap operation, on the order |
|
of 150ns, and is very nearly `O(1)` in the number of arenas being fused (the |
|
true complexity is the inverse Ackermann function, which grows extremely |
|
slowly). |
|
|
|
Each arena does consume some memory, so repeatedly creating and fusing an |
|
additional arena is not free, but the CPU cost of fusing two arenas together is |
|
modest. |
|
|
|
### Initial Block and Custom Allocators |
|
|
|
`upb_Arena` normally uses `malloc()` and `free()` to allocate and return its |
|
underlying blocks. But this default strategy can be customized to support |
|
the needs of a particular language. |
|
|
|
The lowest-level function for creating a `upb_Arena` is: |
|
|
|
```c |
|
// Creates an arena from the given initial block (if any -- n may be 0). |
|
// Additional blocks will be allocated from |alloc|. If |alloc| is NULL, |
|
// this is a fixed-size arena and cannot grow. |
|
UPB_API upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc); |
|
``` |
|
|
|
The buffer `[mem, n]` will be used as an "initial block", which is used to |
|
satisfy allocations before calling any underlying allocation function. Note |
|
that the `upb_Arena` itself will be allocated from the initial block if |
|
possible, so the amount of memory available for allocation from the arena will |
|
be less than `n`. |
|
|
|
The `alloc` parameter specifies a custom memory allocation function which |
|
will be used once the initial block is exhausted. The user can pass `NULL` |
|
as the allocation function, in which case the initial block is the only memory |
|
available in the arena. This can allow upb to be used even in situations where |
|
there is no heap. |
|
|
|
It follows that `upb_Arena_Malloc()` is a fallible operation, and all allocating |
|
operations like `upb_Message_New()` should be checked for failure if there is |
|
any possibility that a fixed size arena is in use.
|
|
|