Protocol Buffers - Google's data interchange format (grpc依赖) https://developers.google.com/protocol-buffers/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

6.4 KiB

upb Design

[TOC]

upb is a protobuf kernel written in C. It is a fast and conformant implementation of protobuf, with a low-level C API that is designed to be wrapped in other languages.

upb is not designed to be used by applications directly. The C API is very low-level, unsafe, and changes frequently. It is important that upb is able to make breaking API changes as necessary, to avoid taking on technical debt that would compromise upb's goals of small code size and fast performance.

Design goals

Goals:

  • Full protobuf conformance
  • Small code size
  • Fast performance (without compromising code size)
  • Easy to wrap in language runtimes
  • Easy to adapt to different memory management schemes (refcounting, GC, etc)

Non-Goals:

  • Stable API
  • Safe API
  • Ergonomic API for applications

Parameters:

  • C99
  • 32 or 64-bit CPU (assumes 4 or 8 byte pointers)
  • Uses pointer tagging, but avoids other implementation-defined behavior
  • Aims to never invoke undefined behavior (tests with ASAN, UBSAN, etc)
  • No global state, fully re-entrant

Arenas

All memory management in upb uses arenas, using the type upb_Arena. Arenas are an alternative to malloc() and free() that significantly reduces the costs of memory allocation.

Arenas obtain blocks of memory using some underlying allocator (likely malloc() and free()), and satisfy allocations using a simple bump allocator that walks through each block in linear order. Allocations cannot be freed individually: it is only possible to free the arena as a whole, which frees all of the underlying blocks.

Here is an example of using the upb_Arena type:

  upb_Arena* arena = upb_Arena_New();

  // Perform some allocations.
  int* x = upb_Arena_Malloc(arena, sizeof(*x));
  int* y = upb_Arena_Malloc(arena, sizeof(*y));

  // We cannot free `x` and `y` separately, we can only free the arena
  // as a whole.
  upb_Arena_Free(arena);

upb uses arenas for all memory management, and this fact is reflected in the API for all upb data structures. All upb functions that allocate take a upb_Arena* parameter and perform allocations using that arena rather than calling malloc() or free().

// upb API to create a message.
UPB_API upb_Message* upb_Message_New(const upb_MiniTable* mini_table,
                                     upb_Arena* arena);

void MakeMessage(const upb_MiniTable* mini_table) {
  upb_Arena* arena = upb_Arena_New();

  // This message is allocated on our arena.
  upb_Message* msg = upb_Message_New(mini_table, arena);

  // We can free the arena whenever we want, but we cannot free the
  // message separately from the arena.
  upb_Arena_Free(arena);

  // msg is now deleted.
}

Arenas are a key part of upb's performance story. Parsing a large protobuf payload usually involves rapidly creating a series of messages, arrays (repeated fields), and maps. It is crucial for parsing performance that these allocations are as fast as possible. Equally important, freeing the tree of messages should be as fast as possible, and arenas can reduce this cost from O(n) to O(lg n).

Avoiding Dangling Pointers

Objects allocated on an arena will frequently contain pointers to other arena-allocated objects. For example, a upb_Message will have pointers to sub-messages that are also arena-allocated.

Unlike unique ownership schemes (such as unique_ptr<>), arenas cannot provide automatic safety from dangling pointers. Instead, upb provides tools to help bridge between higher-level memory management schemes (GC, refcounting, RAII, borrow checkers) and arenas.

If there is only one arena, dangling pointers within the arena are impossible, because all objects are freed at the same time. This is the simplest case. The user must still be careful not to keep dangling pointers that point at arena memory after it has been freed, but dangling pointers between the arena objects will be impossible.

But what if there are multiple arenas? If we have a pointer from one arena to another, how do we ensure that this will not become a dangling pointer?

To help with the multiple arena case, upb provides a primitive called "fuse".

// Fuses the lifetimes of `a` and `b`.  None of the blocks from `a` or `b`
// will be freed until both arenas are freed.
UPB_API bool upb_Arena_Fuse(upb_Arena* a, upb_Arena* b);

When two arenas are fused together, their lifetimes are irreversibly joined, such that none of the arena blocks in either arena will be freed until both arenas are freed with upb_Arena_Free(). This means that dangling pointers between the two arenas will no longer be possible.

Fuse is useful when joining two messages from separate arenas (making one a sub-message of the other). Fuse is a relatively cheap operation, on the order of 150ns, and is very nearly O(1) in the number of arenas being fused (the true complexity is the inverse Ackermann function, which grows extremely slowly).

Each arena does consume some memory, so repeatedly creating and fusing an additional arena is not free, but the CPU cost of fusing two arenas together is modest.

Initial Block and Custom Allocators

upb_Arena normally uses malloc() and free() to allocate and return its underlying blocks. But this default strategy can be customized to support the needs of a particular language.

The lowest-level function for creating a upb_Arena is:

// Creates an arena from the given initial block (if any -- n may be 0).
// Additional blocks will be allocated from |alloc|.  If |alloc| is NULL,
// this is a fixed-size arena and cannot grow.
UPB_API upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc);

The buffer [mem, n] will be used as an "initial block", which is used to satisfy allocations before calling any underlying allocation function. Note that the upb_Arena itself will be allocated from the initial block if possible, so the amount of memory available for allocation from the arena will be less than n.

The alloc parameter specifies a custom memory allocation function which will be used once the initial block is exhausted. The user can pass NULL as the allocation function, in which case the initial block is the only memory available in the arena. This can allow upb to be used even in situations where there is no heap.

It follows that upb_Arena_Malloc() is a fallible operation, and all allocating operations like upb_Message_New() should be checked for failure if there is any possibility that a fixed size arena is in use.