From a73852630f429daa22dfa22fe224813589640594 Mon Sep 17 00:00:00 2001 From: Joshua Haberman Date: Tue, 28 Mar 2023 15:01:35 -0700 Subject: [PATCH] Changed Arena representation so that fusing links arenas together instead of blocks. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430 --- upb/mem/arena.c | 121 +++++++++++++++++++++++---------------- upb/mem/arena_internal.h | 16 ++++-- upb/port/atomic.h | 25 ++++---- upb/port/def.inc | 4 +- upb/wire/decode.c | 44 ++++++++------ 5 files changed, 123 insertions(+), 87 deletions(-) diff --git a/upb/mem/arena.c b/upb/mem/arena.c index 2fc814c22d..4b0bd457c4 100644 --- a/upb/mem/arena.c +++ b/upb/mem/arena.c @@ -45,7 +45,8 @@ static uintptr_t upb_cleanup_metadata(uint32_t* cleanup, } struct _upb_MemBlock { - struct _upb_MemBlock* next; + // Atomic only for the benefit of SpaceAllocated(). + UPB_ATOMIC(_upb_MemBlock*) next; uint32_t size; uint32_t cleanups; // Data follows. @@ -98,11 +99,13 @@ size_t upb_Arena_SpaceAllocated(upb_Arena* arena) { arena = _upb_Arena_FindRoot(arena); size_t memsize = 0; - _upb_MemBlock* block = arena->freelist; - - while (block) { - memsize += sizeof(_upb_MemBlock) + block->size; - block = block->next; + while (arena != NULL) { + _upb_MemBlock* block = arena->blocks; + while (block != NULL) { + memsize += sizeof(_upb_MemBlock) + block->size; + block = upb_Atomic_Load(&block->next, memory_order_relaxed); + } + arena = upb_Atomic_Load(&arena->next, memory_order_relaxed); } return memsize; @@ -119,17 +122,14 @@ uint32_t upb_Arena_DebugRefCount(upb_Arena* a) { return _upb_Arena_RefCountFromTagged(poc); } -static void upb_Arena_addblock(upb_Arena* a, upb_Arena* root, void* ptr, - size_t size) { +static void upb_Arena_AddBlock(upb_Arena* a, void* ptr, size_t size) { _upb_MemBlock* block = ptr; - /* The block is for arena |a|, but should appear in the freelist of |root|. */ - block->next = root->freelist; + // Insert into linked list. + upb_Atomic_Init(&block->next, a->blocks); block->size = (uint32_t)size; block->cleanups = 0; - root->freelist = block; - a->last_size = block->size; - if (!root->freelist_tail) root->freelist_tail = block; + upb_Atomic_Store(&a->blocks, block, memory_order_relaxed); a->head.ptr = UPB_PTR_AT(block, memblock_reserve, char); a->head.end = UPB_PTR_AT(block, size, char); @@ -139,18 +139,20 @@ static void upb_Arena_addblock(upb_Arena* a, upb_Arena* root, void* ptr, UPB_POISON_MEMORY_REGION(a->head.ptr, a->head.end - a->head.ptr); } -static bool upb_Arena_Allocblock(upb_Arena* a, size_t size) { - upb_Arena* root = _upb_Arena_FindRoot(a); - size_t block_size = UPB_MAX(size, a->last_size * 2) + memblock_reserve; - _upb_MemBlock* block = upb_malloc(root->block_alloc, block_size); +static bool upb_Arena_AllocBlock(upb_Arena* a, size_t size) { + if (!a->block_alloc) return false; + _upb_MemBlock* last_block = upb_Atomic_Load(&a->blocks, memory_order_relaxed); + size_t last_size = last_block != NULL ? last_block->size : 128; + size_t block_size = UPB_MAX(size, last_size * 2) + memblock_reserve; + _upb_MemBlock* block = upb_malloc(a->block_alloc, block_size); if (!block) return false; - upb_Arena_addblock(a, root, block, block_size); + upb_Arena_AddBlock(a, block, block_size); return true; } void* _upb_Arena_SlowMalloc(upb_Arena* a, size_t size) { - if (!upb_Arena_Allocblock(a, size)) return NULL; /* Out of memory. */ + if (!upb_Arena_AllocBlock(a, size)) return NULL; /* Out of memory. */ UPB_ASSERT(_upb_ArenaHas(a) >= size); return upb_Arena_Malloc(a, size); } @@ -172,11 +174,12 @@ static upb_Arena* arena_initslow(void* mem, size_t n, upb_alloc* alloc) { a->block_alloc = alloc; upb_Atomic_Init(&a->parent_or_count, _upb_Arena_TaggedFromRefcount(1)); - a->freelist = NULL; - a->freelist_tail = NULL; + upb_Atomic_Init(&a->next, NULL); + upb_Atomic_Init(&a->tail, a); + upb_Atomic_Init(&a->blocks, NULL); a->cleanup_metadata = upb_cleanup_metadata(NULL, false); - upb_Arena_addblock(a, a, mem, n); + upb_Arena_AddBlock(a, mem, n); return a; } @@ -202,37 +205,44 @@ upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc) { a = UPB_PTR_AT(mem, n - sizeof(*a), upb_Arena); - a->block_alloc = alloc; upb_Atomic_Init(&a->parent_or_count, _upb_Arena_TaggedFromRefcount(1)); - a->last_size = UPB_MAX(128, n); + upb_Atomic_Init(&a->next, NULL); + upb_Atomic_Init(&a->tail, a); + upb_Atomic_Init(&a->blocks, NULL); + a->block_alloc = alloc; a->head.ptr = mem; a->head.end = UPB_PTR_AT(mem, n - sizeof(*a), char); - a->freelist = NULL; - a->freelist_tail = NULL; a->cleanup_metadata = upb_cleanup_metadata(NULL, true); return a; } static void arena_dofree(upb_Arena* a) { - _upb_MemBlock* block = a->freelist; UPB_ASSERT(_upb_Arena_RefCountFromTagged(a->parent_or_count) == 1); - while (block) { - /* Load first since we are deleting block. */ - _upb_MemBlock* next = block->next; - - if (block->cleanups > 0) { - cleanup_ent* end = UPB_PTR_AT(block, block->size, void); - cleanup_ent* ptr = end - block->cleanups; - - for (; ptr < end; ptr++) { - ptr->cleanup(ptr->ud); + while (a != NULL) { + // Load first since arena itself is likely from one of its blocks. + upb_Arena* next_arena = + (upb_Arena*)upb_Atomic_Load(&a->next, memory_order_acquire); + _upb_MemBlock* block = upb_Atomic_Load(&a->blocks, memory_order_relaxed); + while (block != NULL) { + /* Load first since we are deleting block. */ + _upb_MemBlock* next_block = + upb_Atomic_Load(&block->next, memory_order_relaxed); + + if (block->cleanups > 0) { + cleanup_ent* end = UPB_PTR_AT(block, block->size, void); + cleanup_ent* ptr = end - block->cleanups; + + for (; ptr < end; ptr++) { + ptr->cleanup(ptr->ud); + } } - } - upb_free(a->block_alloc, block); - block = next; + upb_free(a->block_alloc, block); + block = next_block; + } + a = next_arena; } } @@ -270,7 +280,7 @@ bool upb_Arena_AddCleanup(upb_Arena* a, void* ud, upb_CleanupFunc* func) { uint32_t* cleanups = upb_cleanup_pointer(a->cleanup_metadata); if (!cleanups || _upb_ArenaHas(a) < sizeof(cleanup_ent)) { - if (!upb_Arena_Allocblock(a, 128)) return false; /* Out of memory. */ + if (!upb_Arena_AllocBlock(a, 128)) return false; /* Out of memory. */ UPB_ASSERT(_upb_ArenaHas(a) >= sizeof(cleanup_ent)); cleanups = upb_cleanup_pointer(a->cleanup_metadata); } @@ -339,14 +349,6 @@ bool upb_Arena_Fuse(upb_Arena* a1, upb_Arena* a2) { r2_poc = tmp_poc; } - // r1 takes over r2's freelist, this must happen before we update - // refcounts since the refcount carriers the memory dependencies. - if (r2->freelist_tail) { - UPB_ASSERT(r2->freelist_tail->next == NULL); - r2->freelist_tail->next = r1->freelist; - r1->freelist = r2->freelist; - } - // The moment we install `r1` as the parent for `r2` all racing frees may // immediately begin decrementing `r1`'s refcount. So we must install all the // refcounts that we know about first to prevent a premature unref to zero. @@ -366,5 +368,26 @@ bool upb_Arena_Fuse(upb_Arena* a1, upb_Arena* a2) { upb_Atomic_Sub(&r1->parent_or_count, ((uintptr_t)delta_refcount) << 1, memory_order_release); } + + // Now append r2's linked list of arenas to r1's. + upb_Arena* r2_tail = upb_Atomic_Load(&r2->tail, memory_order_relaxed); + upb_Arena* r1_tail = upb_Atomic_Load(&r1->tail, memory_order_relaxed); + upb_Arena* r1_next = upb_Atomic_Load(&r1_tail->next, memory_order_relaxed); + while (r1_next != NULL) { + // r1->tail was stale. This can happen, but tail should always converge on + // the true tail. + r1_tail = r1_next; + r1_next = upb_Atomic_Load(&r1_tail->next, memory_order_relaxed); + } + + upb_Arena* old_next = + upb_Atomic_Exchange(&r1_tail->next, r2, memory_order_relaxed); + + // Once fuse/fuse races are allowed, it will need to be a CAS instead that + // handles this mismatch gracefully. + UPB_ASSERT(old_next == NULL); + + upb_Atomic_Store(&r1->tail, r2_tail, memory_order_relaxed); + return true; } diff --git a/upb/mem/arena_internal.h b/upb/mem/arena_internal.h index f0ea8f3489..9b940dcfc8 100644 --- a/upb/mem/arena_internal.h +++ b/upb/mem/arena_internal.h @@ -45,7 +45,6 @@ struct upb_Arena { /* Allocator to allocate arena blocks. We are responsible for freeing these * when we are destroyed. */ upb_alloc* block_alloc; - uint32_t last_size; /* When multiple arenas are fused together, each arena points to a parent * arena (root points to itself). The root tracks how many live arenas @@ -55,10 +54,19 @@ struct upb_Arena { * 0: pointer to parent * 1: count, left shifted by one */ - UPB_ATOMIC uintptr_t parent_or_count; + UPB_ATOMIC(uintptr_t) parent_or_count; - /* Linked list of blocks to free/cleanup. */ - _upb_MemBlock *freelist, *freelist_tail; + // All nodes that are fused together are in a singly-linked list. + UPB_ATOMIC(upb_Arena*) next; // NULL at end of list. + + // The last element of the linked list. This is present only as an + // optimization, so that we do not have to iterate over all members for every + // fuse. Only significant for an arena root. In other cases it is ignored. + UPB_ATOMIC(upb_Arena*) tail; // == self when no other list members. + + // Linked list of blocks to free/cleanup. Atomic only for the benefit of + // upb_Arena_SpaceAllocated(). + UPB_ATOMIC(_upb_MemBlock*) blocks; }; UPB_INLINE bool _upb_Arena_IsTaggedRefcount(uintptr_t parent_or_count) { diff --git a/upb/port/atomic.h b/upb/port/atomic.h index 670bf776d9..450297ae50 100644 --- a/upb/port/atomic.h +++ b/upb/port/atomic.h @@ -66,9 +66,7 @@ UPB_INLINE uintptr_t _upb_NonAtomic_ExchangeU(uintptr_t* addr, uintptr_t val) { return ret; } -// `addr` should logically be `void**`, but `void*` allows for more convenient -// implicit conversions. -UPB_INLINE void* _upb_NonAtomic_ExchangeP(void* addr, void* val) { +UPB_INLINE void* _upb_NonAtomic_ExchangeP(upb_Arena** addr, upb_Arena* val) { void* ret; memcpy(&ret, addr, sizeof(val)); memcpy(addr, &val, sizeof(val)); @@ -78,7 +76,7 @@ UPB_INLINE void* _upb_NonAtomic_ExchangeP(void* addr, void* val) { #define upb_Atomic_Exchange(addr, val, order) \ _Generic((val), \ uintptr_t: _upb_NonAtomic_ExchangeU, \ - void*: _upb_NonAtomic_ExchangeP)(addr, val) + upb_Arena *: _upb_NonAtomic_ExchangeP)(addr, val) UPB_INLINE bool _upb_NonAtomic_CompareExchangeStrongU(uintptr_t* addr, uintptr_t* expected, @@ -92,11 +90,9 @@ UPB_INLINE bool _upb_NonAtomic_CompareExchangeStrongU(uintptr_t* addr, } } -// `addr` and `expected` should logically be `void**`, but `void*` allows for -// more convenient implicit conversions. -UPB_INLINE bool _upb_NonAtomic_CompareExchangeStrongP(void* addr, - void* expected, - void* desired) { +UPB_INLINE bool _upb_NonAtomic_CompareExchangeStrongP(upb_Arena** addr, + upb_Arena** expected, + upb_Arena* desired) { if (memcmp(addr, expected, sizeof(desired)) == 0) { memcpy(addr, &desired, sizeof(desired)); return true; @@ -106,11 +102,12 @@ UPB_INLINE bool _upb_NonAtomic_CompareExchangeStrongP(void* addr, } } -#define upb_Atomic_CompareExchangeStrong(addr, expected, desired, \ - success_order, failure_order) \ - _Generic((desired), \ - uintptr_t: _upb_NonAtomic_CompareExchangeStrongU, \ - void*: _upb_NonAtomic_CompareExchangeStrongP)(addr, expected, desired) +#define upb_Atomic_CompareExchangeStrong(addr, expected, desired, \ + success_order, failure_order) \ + _Generic((desired), \ + uintptr_t: _upb_NonAtomic_CompareExchangeStrongU, \ + upb_Arena *: _upb_NonAtomic_CompareExchangeStrongP)(addr, expected, \ + desired) #endif diff --git a/upb/port/def.inc b/upb/port/def.inc index 3baa94b199..7c36a03fd4 100644 --- a/upb/port/def.inc +++ b/upb/port/def.inc @@ -192,9 +192,9 @@ #ifdef __GNUC__ #define UPB_USE_C11_ATOMICS -#define UPB_ATOMIC _Atomic +#define UPB_ATOMIC(T) _Atomic(T) #else -#define UPB_ATOMIC +#define UPB_ATOMIC(T) T #endif /* UPB_PTRADD(ptr, ofs): add pointer while avoiding "NULL + 0" UB */ diff --git a/upb/wire/decode.c b/upb/wire/decode.c index 0ee00ec5a0..e1697fa428 100644 --- a/upb/wire/decode.c +++ b/upb/wire/decode.c @@ -1282,9 +1282,11 @@ static upb_DecodeStatus upb_Decoder_Decode(upb_Decoder* const decoder, UPB_ASSERT(decoder->status != kUpb_DecodeStatus_Ok); } - arena->head.ptr = decoder->arena.head.ptr; - arena->head.end = decoder->arena.head.end; + _upb_MemBlock* blocks = + upb_Atomic_Load(&decoder->arena.blocks, memory_order_relaxed); + arena->head = decoder->arena.head; arena->cleanup_metadata = decoder->arena.cleanup_metadata; + upb_Atomic_Store(&arena->blocks, blocks, memory_order_relaxed); return decoder->status; } @@ -1292,26 +1294,32 @@ upb_DecodeStatus upb_Decode(const char* buf, size_t size, void* msg, const upb_MiniTable* l, const upb_ExtensionRegistry* extreg, int options, upb_Arena* arena) { - upb_Decoder state; + upb_Decoder decoder; unsigned depth = (unsigned)options >> 16; - upb_EpsCopyInputStream_Init(&state.input, &buf, size, + upb_EpsCopyInputStream_Init(&decoder.input, &buf, size, options & kUpb_DecodeOption_AliasString); - state.extreg = extreg; - state.unknown = NULL; - state.depth = depth ? depth : 64; - state.end_group = DECODE_NOGROUP; - state.options = (uint16_t)options; - state.missing_required = false; - state.arena.head = arena->head; - state.arena.last_size = arena->last_size; - state.arena.cleanup_metadata = arena->cleanup_metadata; - upb_Atomic_Init(&state.arena.parent_or_count, - _upb_Arena_TaggedFromPointer(arena)); - state.status = kUpb_DecodeStatus_Ok; - - return upb_Decoder_Decode(&state, buf, msg, l, arena); + decoder.extreg = extreg; + decoder.unknown = NULL; + decoder.depth = depth ? depth : 64; + decoder.end_group = DECODE_NOGROUP; + decoder.options = (uint16_t)options; + decoder.missing_required = false; + decoder.status = kUpb_DecodeStatus_Ok; + + // Violating the encapsulation of the arena for performance reasons. + // This is a temporary arena that we swap into and swap out of when we are + // done. The temporary arena only needs to be able to handle allocation, + // not fuse or free, so it does not need many of the members to be initialized + // (particularly parent_or_count). + _upb_MemBlock* blocks = upb_Atomic_Load(&arena->blocks, memory_order_relaxed); + decoder.arena.head = arena->head; + decoder.arena.block_alloc = arena->block_alloc; + decoder.arena.cleanup_metadata = arena->cleanup_metadata; + upb_Atomic_Init(&decoder.arena.blocks, blocks); + + return upb_Decoder_Decode(&decoder, buf, msg, l, arena); } #undef OP_FIXPCK_LG2