Protocol Buffers - Google's data interchange format (grpc依赖) https://developers.google.com/protocol-buffers/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

371 lines
13 KiB

// Protocol Buffers - Google's data interchange format
// Copyright 2023 Google LLC. All rights reserved.
//
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file or at
// https://developers.google.com/open-source/licenses/bsd
#include "upb/mem/internal/arena.h"
#include "upb/port/atomic.h"
// Must be last.
#include "upb/port/def.inc"
struct _upb_MemBlock {
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
// Atomic only for the benefit of SpaceAllocated().
UPB_ATOMIC(_upb_MemBlock*) next;
uint32_t size;
// Data follows.
};
static const size_t memblock_reserve =
UPB_ALIGN_UP(sizeof(_upb_MemBlock), UPB_MALLOC_ALIGN);
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
typedef struct _upb_ArenaRoot {
upb_Arena* root;
uintptr_t tagged_count;
} _upb_ArenaRoot;
static _upb_ArenaRoot _upb_Arena_FindRoot(upb_Arena* a) {
uintptr_t poc = upb_Atomic_Load(&a->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
while (_upb_Arena_IsTaggedPointer(poc)) {
upb_Arena* next = _upb_Arena_PointerFromTagged(poc);
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
UPB_ASSERT(a != next);
uintptr_t next_poc =
upb_Atomic_Load(&next->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
if (_upb_Arena_IsTaggedPointer(next_poc)) {
// To keep complexity down, we lazily collapse levels of the tree. This
// keeps it flat in the final case, but doesn't cost much incrementally.
//
// Path splitting keeps time complexity down, see:
// https://en.wikipedia.org/wiki/Disjoint-set_data_structure
//
// We can safely use a relaxed atomic here because all threads doing this
// will converge on the same value and we don't need memory orderings to
// be visible.
//
// This is true because:
// - If no fuses occur, this will eventually become the root.
// - If fuses are actively occurring, the root may change, but the
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
// invariant is that `parent_or_count` merely points to *a* parent.
//
// In other words, it is moving towards "the" root, and that root may move
// further away over time, but the path towards that root will continue to
// be valid and the creation of the path carries all the memory orderings
// required.
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
UPB_ASSERT(a != _upb_Arena_PointerFromTagged(next_poc));
upb_Atomic_Store(&a->parent_or_count, next_poc, memory_order_relaxed);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
}
a = next;
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
poc = next_poc;
}
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
return (_upb_ArenaRoot){.root = a, .tagged_count = poc};
}
size_t upb_Arena_SpaceAllocated(upb_Arena* arena) {
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
arena = _upb_Arena_FindRoot(arena).root;
size_t memsize = 0;
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
while (arena != NULL) {
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
_upb_MemBlock* block =
upb_Atomic_Load(&arena->blocks, memory_order_relaxed);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
while (block != NULL) {
memsize += sizeof(_upb_MemBlock) + block->size;
block = upb_Atomic_Load(&block->next, memory_order_relaxed);
}
arena = upb_Atomic_Load(&arena->next, memory_order_relaxed);
}
return memsize;
}
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
uint32_t upb_Arena_DebugRefCount(upb_Arena* a) {
// These loads could probably be relaxed, but given that this is debug-only,
// it's not worth introducing a new variant for it.
uintptr_t poc = upb_Atomic_Load(&a->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
while (_upb_Arena_IsTaggedPointer(poc)) {
a = _upb_Arena_PointerFromTagged(poc);
poc = upb_Atomic_Load(&a->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
}
return _upb_Arena_RefCountFromTagged(poc);
}
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
static void upb_Arena_AddBlock(upb_Arena* a, void* ptr, size_t size) {
_upb_MemBlock* block = ptr;
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
// Insert into linked list.
block->size = (uint32_t)size;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
upb_Atomic_Init(&block->next, a->blocks);
upb_Atomic_Store(&a->blocks, block, memory_order_release);
a->head.ptr = UPB_PTR_AT(block, memblock_reserve, char);
a->head.end = UPB_PTR_AT(block, size, char);
UPB_POISON_MEMORY_REGION(a->head.ptr, a->head.end - a->head.ptr);
}
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
static bool upb_Arena_AllocBlock(upb_Arena* a, size_t size) {
if (!a->block_alloc) return false;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
_upb_MemBlock* last_block = upb_Atomic_Load(&a->blocks, memory_order_acquire);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
size_t last_size = last_block != NULL ? last_block->size : 128;
size_t block_size = UPB_MAX(size, last_size * 2) + memblock_reserve;
_upb_MemBlock* block = upb_malloc(upb_Arena_BlockAlloc(a), block_size);
if (!block) return false;
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
upb_Arena_AddBlock(a, block, block_size);
return true;
}
void* _upb_Arena_SlowMalloc(upb_Arena* a, size_t size) {
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
if (!upb_Arena_AllocBlock(a, size)) return NULL; /* Out of memory. */
UPB_ASSERT(_upb_ArenaHas(a) >= size);
return upb_Arena_Malloc(a, size);
}
/* Public Arena API ***********************************************************/
static upb_Arena* upb_Arena_InitSlow(upb_alloc* alloc) {
const size_t first_block_overhead = sizeof(upb_Arena) + memblock_reserve;
upb_Arena* a;
/* We need to malloc the initial block. */
char* mem;
size_t n = first_block_overhead + 256;
if (!alloc || !(mem = upb_malloc(alloc, n))) {
return NULL;
}
a = UPB_PTR_AT(mem, n - sizeof(*a), upb_Arena);
n -= sizeof(*a);
a->block_alloc = upb_Arena_MakeBlockAlloc(alloc, 0);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
upb_Atomic_Init(&a->parent_or_count, _upb_Arena_TaggedFromRefcount(1));
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
upb_Atomic_Init(&a->next, NULL);
upb_Atomic_Init(&a->tail, a);
upb_Atomic_Init(&a->blocks, NULL);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
upb_Arena_AddBlock(a, mem, n);
return a;
}
upb_Arena* upb_Arena_Init(void* mem, size_t n, upb_alloc* alloc) {
upb_Arena* a;
if (n) {
/* Align initial pointer up so that we return properly-aligned pointers. */
void* aligned = (void*)UPB_ALIGN_UP((uintptr_t)mem, UPB_MALLOC_ALIGN);
size_t delta = (uintptr_t)aligned - (uintptr_t)mem;
n = delta <= n ? n - delta : 0;
mem = aligned;
}
/* Round block size down to alignof(*a) since we will allocate the arena
* itself at the end. */
n = UPB_ALIGN_DOWN(n, UPB_ALIGN_OF(upb_Arena));
if (UPB_UNLIKELY(n < sizeof(upb_Arena))) {
return upb_Arena_InitSlow(alloc);
}
a = UPB_PTR_AT(mem, n - sizeof(*a), upb_Arena);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
upb_Atomic_Init(&a->parent_or_count, _upb_Arena_TaggedFromRefcount(1));
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
upb_Atomic_Init(&a->next, NULL);
upb_Atomic_Init(&a->tail, a);
upb_Atomic_Init(&a->blocks, NULL);
a->block_alloc = upb_Arena_MakeBlockAlloc(alloc, 1);
a->head.ptr = mem;
a->head.end = UPB_PTR_AT(mem, n - sizeof(*a), char);
return a;
}
static void arena_dofree(upb_Arena* a) {
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
UPB_ASSERT(_upb_Arena_RefCountFromTagged(a->parent_or_count) == 1);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
while (a != NULL) {
// Load first since arena itself is likely from one of its blocks.
upb_Arena* next_arena =
(upb_Arena*)upb_Atomic_Load(&a->next, memory_order_acquire);
upb_alloc* block_alloc = upb_Arena_BlockAlloc(a);
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
_upb_MemBlock* block = upb_Atomic_Load(&a->blocks, memory_order_acquire);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
while (block != NULL) {
// Load first since we are deleting block.
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
_upb_MemBlock* next_block =
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
upb_Atomic_Load(&block->next, memory_order_acquire);
upb_free(block_alloc, block);
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
block = next_block;
}
a = next_arena;
}
}
void upb_Arena_Free(upb_Arena* a) {
uintptr_t poc = upb_Atomic_Load(&a->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
retry:
while (_upb_Arena_IsTaggedPointer(poc)) {
a = _upb_Arena_PointerFromTagged(poc);
poc = upb_Atomic_Load(&a->parent_or_count, memory_order_acquire);
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
}
// compare_exchange or fetch_sub are RMW operations, which are more
// expensive then direct loads. As an optimization, we only do RMW ops
// when we need to update things for other threads to see.
if (poc == _upb_Arena_TaggedFromRefcount(1)) {
arena_dofree(a);
return;
}
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
if (upb_Atomic_CompareExchangeWeak(
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
&a->parent_or_count, &poc,
_upb_Arena_TaggedFromRefcount(_upb_Arena_RefCountFromTagged(poc) - 1),
memory_order_release, memory_order_acquire)) {
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
// We were >1 and we decremented it successfully, so we are done.
return;
}
// We failed our update, so someone has done something, retry the whole
// process, but the failed exchange reloaded `poc` for us.
goto retry;
}
Switch upb_Arena_Fuse from a CAS based list insertion to an exchange based one Second try with improved testing (Generated by http://go/benchy. Settings: --runs 20 --reference "srcfs" --perflab) ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.2ns ± 2% 18.1ns ± 1% -0.72% (p=0.002 n=18+17) BM_ArenaInitialBlockOneAlloc 5.31ns ± 0% 5.30ns ± 1% ~ (p=0.345 n=16+19) BM_ArenaFuseUnbalanced/2 67.8ns ± 1% 68.0ns ± 0% +0.35% (p=0.011 n=16+17) BM_ArenaFuseUnbalanced/8 526ns ± 2% 524ns ± 1% ~ (p=0.708 n=18+17) BM_ArenaFuseUnbalanced/64 4.82µs ± 1% 4.84µs ± 1% +0.31% (p=0.049 n=16+17) BM_ArenaFuseUnbalanced/128 9.78µs ± 1% 9.82µs ± 1% +0.46% (p=0.001 n=17+17) BM_ArenaFuseBalanced/2 66.9ns ± 1% 67.2ns ± 1% +0.36% (p=0.025 n=17+16) BM_ArenaFuseBalanced/8 527ns ± 2% 529ns ± 1% ~ (p=0.081 n=17+19) BM_ArenaFuseBalanced/64 4.92µs ± 4% 4.88µs ± 2% ~ (p=0.184 n=18+17) BM_ArenaFuseBalanced/128 9.92µs ± 1% 9.91µs ± 1% ~ (p=0.883 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.89ms ± 2% 5.94ms ± 1% +0.88% (p=0.005 n=18+17) BM_LoadAdsDescriptor_Upb<WithLayout> 6.55ms ± 2% 6.55ms ± 1% ~ (p=0.961 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.3ms ± 2% 12.4ms ± 1% ~ (p=0.226 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.5ms ± 1% 12.6ms ± 1% +0.61% (p=0.005 n=17+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.6µs ± 1% 12.7µs ± 2% ~ (p=0.219 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 2% 11.6µs ± 3% ~ (p=0.721 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.4µs ± 1% 12.5µs ± 1% ~ (p=0.118 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.3µs ± 2% 11.4µs ± 1% ~ (p=0.327 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.2µs ± 2% 25.3µs ± 1% ~ (p=0.301 n=16+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 3% 12.1µs ± 2% ~ (p=0.869 n=18+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.8µs ± 3% 11.8µs ± 3% ~ (p=0.462 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.333 n=16+19) BM_SerializeDescriptor_Proto2 5.83µs ± 3% 5.86µs ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 10.5µs ± 2% 10.4µs ± 1% -1.20% (p=0.000 n=18+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.2ns ± 2% 18.1ns ± 0% -0.73% (p=0.010 n=18+17) BM_ArenaInitialBlockOneAlloc 5.32ns ± 0% 5.31ns ± 1% ~ (p=0.106 n=15+18) BM_ArenaFuseUnbalanced/2 67.9ns ± 1% 68.1ns ± 0% +0.31% (p=0.044 n=16+16) BM_ArenaFuseUnbalanced/8 527ns ± 2% 526ns ± 1% ~ (p=0.772 n=18+16) BM_ArenaFuseUnbalanced/64 4.83µs ± 1% 4.84µs ± 2% ~ (p=0.144 n=16+18) BM_ArenaFuseUnbalanced/128 9.79µs ± 1% 9.84µs ± 1% +0.52% (p=0.001 n=17+18) BM_ArenaFuseBalanced/2 67.0ns ± 1% 67.3ns ± 3% +0.41% (p=0.019 n=15+16) BM_ArenaFuseBalanced/8 528ns ± 2% 530ns ± 1% ~ (p=0.121 n=17+19) BM_ArenaFuseBalanced/64 4.93µs ± 4% 4.89µs ± 2% ~ (p=0.103 n=18+17) BM_ArenaFuseBalanced/128 9.93µs ± 1% 9.93µs ± 1% ~ (p=0.806 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.91ms ± 2% 5.96ms ± 1% +0.93% (p=0.002 n=18+16) BM_LoadAdsDescriptor_Upb<WithLayout> 6.57ms ± 2% 6.57ms ± 1% ~ (p=0.935 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.4ms ± 2% 12.4ms ± 1% ~ (p=0.239 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.5ms ± 2% 12.6ms ± 1% +0.43% (p=0.024 n=18+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 2% 12.7µs ± 2% ~ (p=0.245 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 2% 11.6µs ± 2% ~ (p=0.772 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.136 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 2% 11.4µs ± 1% ~ (p=0.391 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.3µs ± 2% 25.4µs ± 1% ~ (p=0.403 n=16+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 2% ~ (p=0.731 n=17+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.8µs ± 3% ~ (p=0.424 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 2% 13.3µs ± 1% ~ (p=0.683 n=16+19) BM_SerializeDescriptor_Proto2 5.84µs ± 3% 5.86µs ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 10.5µs ± 2% 10.4µs ± 1% -1.27% (p=0.000 n=18+16) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 133MB/s ± 2% 132MB/s ± 1% -0.97% (p=0.002 n=18+16) BM_LoadAdsDescriptor_Upb<WithLayout> 120MB/s ± 2% 120MB/s ± 1% ~ (p=0.961 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.5MB/s ± 2% 63.3MB/s ± 1% ~ (p=0.226 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 62.7MB/s ± 1% 62.4MB/s ± 1% -0.60% (p=0.005 n=17+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 596MB/s ± 1% 594MB/s ± 2% ~ (p=0.219 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 650MB/s ± 2% 649MB/s ± 3% ~ (p=0.721 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 605MB/s ± 1% 603MB/s ± 1% ~ (p=0.118 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 663MB/s ± 2% 661MB/s ± 1% ~ (p=0.327 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 298MB/s ± 2% 297MB/s ± 1% ~ (p=0.490 n=17+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 623MB/s ± 3% 624MB/s ± 2% ~ (p=0.869 n=18+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 636MB/s ± 3% 637MB/s ± 3% ~ (p=0.462 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 570MB/s ± 1% 568MB/s ± 1% ~ (p=0.333 n=16+19) BM_SerializeDescriptor_Proto2 1.29GB/s ± 3% 1.29GB/s ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 716MB/s ± 2% 725MB/s ± 1% +1.20% (p=0.000 n=18+16) ``` PiperOrigin-RevId: 525132431
2 years ago
static void _upb_Arena_DoFuseArenaLists(upb_Arena* const parent,
upb_Arena* child) {
upb_Arena* parent_tail = upb_Atomic_Load(&parent->tail, memory_order_relaxed);
do {
// Our tail might be stale, but it will always converge to the true tail.
upb_Arena* parent_tail_next =
upb_Atomic_Load(&parent_tail->next, memory_order_relaxed);
while (parent_tail_next != NULL) {
parent_tail = parent_tail_next;
parent_tail_next =
upb_Atomic_Load(&parent_tail->next, memory_order_relaxed);
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
}
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
Switch upb_Arena_Fuse from a CAS based list insertion to an exchange based one Second try with improved testing (Generated by http://go/benchy. Settings: --runs 20 --reference "srcfs" --perflab) ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.2ns ± 2% 18.1ns ± 1% -0.72% (p=0.002 n=18+17) BM_ArenaInitialBlockOneAlloc 5.31ns ± 0% 5.30ns ± 1% ~ (p=0.345 n=16+19) BM_ArenaFuseUnbalanced/2 67.8ns ± 1% 68.0ns ± 0% +0.35% (p=0.011 n=16+17) BM_ArenaFuseUnbalanced/8 526ns ± 2% 524ns ± 1% ~ (p=0.708 n=18+17) BM_ArenaFuseUnbalanced/64 4.82µs ± 1% 4.84µs ± 1% +0.31% (p=0.049 n=16+17) BM_ArenaFuseUnbalanced/128 9.78µs ± 1% 9.82µs ± 1% +0.46% (p=0.001 n=17+17) BM_ArenaFuseBalanced/2 66.9ns ± 1% 67.2ns ± 1% +0.36% (p=0.025 n=17+16) BM_ArenaFuseBalanced/8 527ns ± 2% 529ns ± 1% ~ (p=0.081 n=17+19) BM_ArenaFuseBalanced/64 4.92µs ± 4% 4.88µs ± 2% ~ (p=0.184 n=18+17) BM_ArenaFuseBalanced/128 9.92µs ± 1% 9.91µs ± 1% ~ (p=0.883 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.89ms ± 2% 5.94ms ± 1% +0.88% (p=0.005 n=18+17) BM_LoadAdsDescriptor_Upb<WithLayout> 6.55ms ± 2% 6.55ms ± 1% ~ (p=0.961 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.3ms ± 2% 12.4ms ± 1% ~ (p=0.226 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.5ms ± 1% 12.6ms ± 1% +0.61% (p=0.005 n=17+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.6µs ± 1% 12.7µs ± 2% ~ (p=0.219 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 2% 11.6µs ± 3% ~ (p=0.721 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.4µs ± 1% 12.5µs ± 1% ~ (p=0.118 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.3µs ± 2% 11.4µs ± 1% ~ (p=0.327 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.2µs ± 2% 25.3µs ± 1% ~ (p=0.301 n=16+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 3% 12.1µs ± 2% ~ (p=0.869 n=18+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.8µs ± 3% 11.8µs ± 3% ~ (p=0.462 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.333 n=16+19) BM_SerializeDescriptor_Proto2 5.83µs ± 3% 5.86µs ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 10.5µs ± 2% 10.4µs ± 1% -1.20% (p=0.000 n=18+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.2ns ± 2% 18.1ns ± 0% -0.73% (p=0.010 n=18+17) BM_ArenaInitialBlockOneAlloc 5.32ns ± 0% 5.31ns ± 1% ~ (p=0.106 n=15+18) BM_ArenaFuseUnbalanced/2 67.9ns ± 1% 68.1ns ± 0% +0.31% (p=0.044 n=16+16) BM_ArenaFuseUnbalanced/8 527ns ± 2% 526ns ± 1% ~ (p=0.772 n=18+16) BM_ArenaFuseUnbalanced/64 4.83µs ± 1% 4.84µs ± 2% ~ (p=0.144 n=16+18) BM_ArenaFuseUnbalanced/128 9.79µs ± 1% 9.84µs ± 1% +0.52% (p=0.001 n=17+18) BM_ArenaFuseBalanced/2 67.0ns ± 1% 67.3ns ± 3% +0.41% (p=0.019 n=15+16) BM_ArenaFuseBalanced/8 528ns ± 2% 530ns ± 1% ~ (p=0.121 n=17+19) BM_ArenaFuseBalanced/64 4.93µs ± 4% 4.89µs ± 2% ~ (p=0.103 n=18+17) BM_ArenaFuseBalanced/128 9.93µs ± 1% 9.93µs ± 1% ~ (p=0.806 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.91ms ± 2% 5.96ms ± 1% +0.93% (p=0.002 n=18+16) BM_LoadAdsDescriptor_Upb<WithLayout> 6.57ms ± 2% 6.57ms ± 1% ~ (p=0.935 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.4ms ± 2% 12.4ms ± 1% ~ (p=0.239 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.5ms ± 2% 12.6ms ± 1% +0.43% (p=0.024 n=18+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 2% 12.7µs ± 2% ~ (p=0.245 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 2% 11.6µs ± 2% ~ (p=0.772 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.136 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 2% 11.4µs ± 1% ~ (p=0.391 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.3µs ± 2% 25.4µs ± 1% ~ (p=0.403 n=16+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 2% ~ (p=0.731 n=17+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.8µs ± 3% ~ (p=0.424 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 2% 13.3µs ± 1% ~ (p=0.683 n=16+19) BM_SerializeDescriptor_Proto2 5.84µs ± 3% 5.86µs ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 10.5µs ± 2% 10.4µs ± 1% -1.27% (p=0.000 n=18+16) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 133MB/s ± 2% 132MB/s ± 1% -0.97% (p=0.002 n=18+16) BM_LoadAdsDescriptor_Upb<WithLayout> 120MB/s ± 2% 120MB/s ± 1% ~ (p=0.961 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.5MB/s ± 2% 63.3MB/s ± 1% ~ (p=0.226 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 62.7MB/s ± 1% 62.4MB/s ± 1% -0.60% (p=0.005 n=17+19) BM_Parse_Upb_FileDesc<UseArena, Copy> 596MB/s ± 1% 594MB/s ± 2% ~ (p=0.219 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 650MB/s ± 2% 649MB/s ± 3% ~ (p=0.721 n=16+18) BM_Parse_Upb_FileDesc<InitBlock, Copy> 605MB/s ± 1% 603MB/s ± 1% ~ (p=0.118 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Alias> 663MB/s ± 2% 661MB/s ± 1% ~ (p=0.327 n=18+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 298MB/s ± 2% 297MB/s ± 1% ~ (p=0.490 n=17+19) BM_Parse_Proto2<FileDesc, UseArena, Copy> 623MB/s ± 3% 624MB/s ± 2% ~ (p=0.869 n=18+19) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 636MB/s ± 3% 637MB/s ± 3% ~ (p=0.462 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 570MB/s ± 1% 568MB/s ± 1% ~ (p=0.333 n=16+19) BM_SerializeDescriptor_Proto2 1.29GB/s ± 3% 1.29GB/s ± 4% ~ (p=0.496 n=18+20) BM_SerializeDescriptor_Upb 716MB/s ± 2% 725MB/s ± 1% +1.20% (p=0.000 n=18+16) ``` PiperOrigin-RevId: 525132431
2 years ago
upb_Arena* displaced =
upb_Atomic_Exchange(&parent_tail->next, child, memory_order_relaxed);
parent_tail = upb_Atomic_Load(&child->tail, memory_order_relaxed);
// If we displaced something that got installed racily, we can simply
// reinstall it on our new tail.
child = displaced;
} while (child != NULL);
upb_Atomic_Store(&parent->tail, parent_tail, memory_order_relaxed);
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
}
static upb_Arena* _upb_Arena_DoFuse(upb_Arena* a1, upb_Arena* a2,
uintptr_t* ref_delta) {
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
// `parent_or_count` has two disctint modes
// - parent pointer mode
// - refcount mode
//
// In parent pointer mode, it may change what pointer it refers to in the
// tree, but it will always approach a root. Any operation that walks the
// tree to the root may collapse levels of the tree concurrently.
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
_upb_ArenaRoot r1 = _upb_Arena_FindRoot(a1);
_upb_ArenaRoot r2 = _upb_Arena_FindRoot(a2);
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
if (r1.root == r2.root) return r1.root; // Already fused.
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
// Avoid cycles by always fusing into the root with the lower address.
if ((uintptr_t)r1.root > (uintptr_t)r2.root) {
_upb_ArenaRoot tmp = r1;
r1 = r2;
r2 = tmp;
}
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
// The moment we install `r1` as the parent for `r2` all racing frees may
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
// immediately begin decrementing `r1`'s refcount (including pending
// increments to that refcount and their frees!). We need to add `r2`'s refs
// now, so that `r1` can withstand any unrefs that come from r2.
//
// Note that while it is possible for `r2`'s refcount to increase
// asynchronously, we will not actually do the reparenting operation below
// unless `r2`'s refcount is unchanged from when we read it.
//
// Note that we may have done this previously, either to this node or a
// different node, during a previous and failed DoFuse() attempt. But we will
// not lose track of these refs because we always add them to our overall
// delta.
uintptr_t r2_untagged_count = r2.tagged_count & ~1;
uintptr_t with_r2_refs = r1.tagged_count + r2_untagged_count;
if (!upb_Atomic_CompareExchangeStrong(
&r1.root->parent_or_count, &r1.tagged_count, with_r2_refs,
memory_order_release, memory_order_acquire)) {
return NULL;
Allow for fuse/free races in `upb_Arena`. Implementation is by kfm@, I only added the portability code around it. `upb_Arena` was designed to be only thread-compatible. However, fusing of arenas muddies the waters somewhat, because two distinct `upb_Arena` objects will end up sharing state when fused. This causes a `upb_Arena_Free(a)` to interfere with `upb_Arena_Fuse(b, c)` if `a` and `b` were previously fused. It turns out that we can use atomics to fix this with about a 35% regression in fuse performance (see below). Arena create+free does not regress, thanks to special-case logic in Free(). `upb_Arena` is still a thread-compatible type, and it is still never safe to call `upb_Arena_xxx(a)` and `upb_Arena_yyy(a)` in parallel. However you can at least now call `upb_Arena_Free(a)` and `upb_Arena_Fuse(b, c)` in parallel, even if `a` and `b` were previously fused. Note that `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` is still not allowed if `b` and `c` were previously fused. In practice this means that fuses must still be single-threaded within a single fused group. Performance results: ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.6ns ± 1% 18.6ns ± 1% ~ (p=0.726 n=18+17) BM_ArenaInitialBlockOneAlloc 6.28ns ± 1% 5.73ns ± 1% -8.68% (p=0.000 n=17+20) BM_ArenaFuseUnbalanced/2 44.1ns ± 2% 60.4ns ± 1% +37.05% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/8 370ns ± 2% 500ns ± 1% +35.12% (p=0.000 n=19+20) BM_ArenaFuseUnbalanced/64 3.52µs ± 1% 4.71µs ± 1% +33.80% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.20µs ± 1% 9.72µs ± 2% +34.93% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.4ns ± 2% 61.4ns ± 1% +38.23% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 509ns ± 1% +36.57% (p=0.000 n=19+17) BM_ArenaFuseBalanced/64 3.55µs ± 2% 4.79µs ± 1% +34.80% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.26µs ± 1% 9.76µs ± 1% +34.45% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.66ms ± 1% 5.69ms ± 1% +0.57% (p=0.013 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.30ms ± 1% 6.36ms ± 1% +0.90% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.1ms ± 1% ~ (p=0.118 n=18+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.50% (p=0.006 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.194 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.192 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 0% ~ (p=0.750 n=18+14) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% -0.34% (p=0.046 n=19+19) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.7µs ± 2% +1.37% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.143 n=18+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.9µs ± 3% 11.9µs ± 1% ~ (p=0.076 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.053 n=19+19) BM_SerializeDescriptor_Proto2 5.97µs ± 4% 5.90µs ± 4% ~ (p=0.093 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.909 n=17+18) name old time/op new time/op delta BM_ArenaOneAlloc 18.7ns ± 2% 18.6ns ± 0% ~ (p=0.607 n=18+17) BM_ArenaInitialBlockOneAlloc 6.29ns ± 1% 5.74ns ± 1% -8.71% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/2 44.1ns ± 1% 60.6ns ± 1% +37.21% (p=0.000 n=17+19) BM_ArenaFuseUnbalanced/8 371ns ± 2% 500ns ± 1% +35.02% (p=0.000 n=19+16) BM_ArenaFuseUnbalanced/64 3.53µs ± 1% 4.72µs ± 1% +33.85% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/128 7.22µs ± 1% 9.73µs ± 2% +34.87% (p=0.000 n=16+19) BM_ArenaFuseBalanced/2 44.5ns ± 2% 61.5ns ± 1% +38.22% (p=0.000 n=20+17) BM_ArenaFuseBalanced/8 373ns ± 2% 510ns ± 1% +36.58% (p=0.000 n=19+16) BM_ArenaFuseBalanced/64 3.56µs ± 2% 4.80µs ± 1% +34.87% (p=0.000 n=19+19) BM_ArenaFuseBalanced/128 7.27µs ± 1% 9.77µs ± 1% +34.40% (p=0.000 n=17+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.67ms ± 1% 5.71ms ± 1% +0.60% (p=0.011 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.32ms ± 1% 6.37ms ± 1% +0.87% (p=0.000 n=19+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 12.1ms ± 1% 12.2ms ± 1% ~ (p=0.126 n=18+19) BM_LoadAdsDescriptor_Proto2<WithLayout> 12.2ms ± 1% 12.3ms ± 1% +0.51% (p=0.002 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.7µs ± 1% 12.7µs ± 1% ~ (p=0.149 n=20+19) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.6µs ± 1% 11.6µs ± 1% ~ (p=0.211 n=20+20) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.5µs ± 1% 12.5µs ± 1% ~ (p=0.986 n=18+15) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.4µs ± 1% 11.3µs ± 1% ~ (p=0.081 n=19+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 25.4µs ± 1% 25.8µs ± 2% +1.41% (p=0.000 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 12.1µs ± 2% 12.1µs ± 1% ~ (p=0.558 n=19+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 12.0µs ± 3% 11.9µs ± 1% ~ (p=0.165 n=17+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.070 n=19+19) BM_SerializeDescriptor_Proto2 5.98µs ± 4% 5.92µs ± 3% ~ (p=0.138 n=17+19) BM_SerializeDescriptor_Upb 10.4µs ± 1% 10.4µs ± 1% ~ (p=0.858 n=17+18) ``` PiperOrigin-RevId: 518573683
2 years ago
}
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
// Perform the actual fuse by removing the refs from `r2` and swapping in the
// parent pointer.
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
if (!upb_Atomic_CompareExchangeStrong(
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
&r2.root->parent_or_count, &r2.tagged_count,
_upb_Arena_TaggedFromPointer(r1.root), memory_order_release,
memory_order_acquire)) {
// We'll need to remove the excess refs we added to r1 previously.
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
*ref_delta += r2_untagged_count;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
return NULL;
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
}
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
// Now that the fuse has been performed (and can no longer fail) we need to
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
// append `r2` to `r1`'s linked list.
_upb_Arena_DoFuseArenaLists(r1.root, r2.root);
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
return r1.root;
}
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
static bool _upb_Arena_FixupRefs(upb_Arena* new_root, uintptr_t ref_delta) {
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
if (ref_delta == 0) return true; // No fixup required.
uintptr_t poc =
upb_Atomic_Load(&new_root->parent_or_count, memory_order_relaxed);
if (_upb_Arena_IsTaggedPointer(poc)) return false;
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
uintptr_t with_refs = poc - ref_delta;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
UPB_ASSERT(!_upb_Arena_IsTaggedPointer(with_refs));
return upb_Atomic_CompareExchangeStrong(&new_root->parent_or_count, &poc,
with_refs, memory_order_relaxed,
memory_order_relaxed);
}
Changed Arena representation so that fusing links arenas together instead of blocks. Previously when fusing, we would concatenate all blocks into a single list that lived in the arena root. From then on, all arenas would add their blocks to this single unified list. After this CL, arenas keep their distinct list of blocks even after being fused. Instead of unifying the block list, fuse now puts the arenas themselves into a list, so all arenas in the fused group can be iterated over at any time. This design makes it easier to keep each individual arena thread-compatible, because fuse and free are now the only mutating operations that touch state that is shared with the entire group. Read-only operations like `SpaceAllocated()` also iterate the list of arenas, but in a read-only fashion. (Note: we need tests for SpaceAllocated(), both single-threaded for correctness and multi-threaded for resilience to crashes and data races). Performance of fuse regresses by 5-20%. This is somewhat expected as we are performing more atomic operations during a fuse. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.7ns ± 4% +2.00% (p=0.016 n=18+18) BM_ArenaInitialBlockOneAlloc 5.50ns ± 4% 6.57ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.3ns ±10% 68.7ns ± 4% +15.85% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 479ns ± 5% 540ns ± 8% +12.57% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.50µs ± 4% 4.93µs ± 8% +9.59% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.24µs ± 3% 9.96µs ± 3% +7.81% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.3ns ±18% 71.0ns ± 4% +12.14% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 484ns ± 9% 543ns ±10% +12.11% (p=0.000 n=17+16) BM_ArenaFuseBalanced/64 4.50µs ± 6% 4.94µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.20µs ± 4% 9.95µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.50ms ± 8% 5.69ms ±17% ~ (p=0.189 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.10ms ± 5% 6.05ms ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.8ms ± 5% 12.4ms ±17% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.1µs ± 8% 12.1µs ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 10.9µs ± 7% 11.0µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 4% 24.4µs ± 7% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 5% 11.6µs ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.77µs ± 5% 5.71µs ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.102 n=19+16) name old time/op new time/op delta BM_ArenaOneAlloc 18.4ns ± 6% 18.8ns ± 4% +1.98% (p=0.019 n=18+18) BM_ArenaInitialBlockOneAlloc 5.51ns ± 4% 6.58ns ± 4% +19.42% (p=0.000 n=16+17) BM_ArenaFuseUnbalanced/2 59.5ns ±10% 68.9ns ± 4% +15.83% (p=0.000 n=19+19) BM_ArenaFuseUnbalanced/8 481ns ± 5% 541ns ± 8% +12.54% (p=0.000 n=18+19) BM_ArenaFuseUnbalanced/64 4.51µs ± 4% 4.94µs ± 8% +9.53% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 9.26µs ± 3% 9.98µs ± 3% +7.79% (p=0.000 n=17+17) BM_ArenaFuseBalanced/2 63.5ns ±19% 71.1ns ± 3% +12.07% (p=0.000 n=19+18) BM_ArenaFuseBalanced/8 485ns ± 9% 551ns ±20% +13.47% (p=0.000 n=17+17) BM_ArenaFuseBalanced/64 4.51µs ± 6% 4.95µs ± 4% +9.62% (p=0.000 n=19+17) BM_ArenaFuseBalanced/128 9.22µs ± 4% 9.97µs ± 4% +8.12% (p=0.000 n=16+19) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 8% 5.72ms ±18% ~ (p=0.199 n=18+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.12ms ± 5% 6.07ms ± 4% ~ (p=0.273 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.9ms ±15% 11.6ms ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 5% 12.5ms ±18% ~ (p=0.582 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 8% 12.1µs ± 3% ~ (p=0.963 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.8µs ±17% 11.1µs ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.0µs ± 5% 11.9µs ± 4% ~ (p=0.126 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.0µs ± 6% 11.1µs ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.3µs ± 4% 24.5µs ± 6% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.7µs ± 5% 11.6µs ± 4% ~ (p=0.574 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.3µs ± 3% 11.3µs ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.7µs ± 8% 12.7µs ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 5.78µs ± 5% 5.73µs ± 5% ~ (p=0.357 n=17+17) BM_SerializeDescriptor_Upb 10.0µs ± 5% 10.1µs ± 7% ~ (p=0.117 n=19+16) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.08k ± 0% 6.05k ± 0% -0.54% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 6.39k ± 0% 6.36k ± 0% -0.55% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 336 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 672 ± 0% 672 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.69k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.5k ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 43.0k ± 0% 43.0k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 9.89M ± 0% 9.95M ± 0% +0.65% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 9.95M ± 0% 10.02M ± 0% +0.70% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (p=0.800 n=20+20) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% ~ (p=0.752 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 138MB/s ± 7% 132MB/s ±15% ~ (p=0.126 n=18+20) BM_LoadAdsDescriptor_Upb<WithLayout> 124MB/s ± 5% 125MB/s ± 4% ~ (p=0.258 n=17+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 63.9MB/s ±13% 65.2MB/s ± 5% ~ (p=0.589 n=19+16) BM_LoadAdsDescriptor_Proto2<WithLayout> 64.0MB/s ± 5% 61.3MB/s ±15% ~ (p=0.604 n=16+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 620MB/s ± 8% 622MB/s ± 4% ~ (p=1.000 n=18+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 644MB/s ±15% 679MB/s ± 4% ~ (p=0.104 n=20+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 627MB/s ± 4% 633MB/s ± 4% ~ (p=0.134 n=18+19) BM_Parse_Upb_FileDesc<InitBlock, Alias> 688MB/s ± 6% 682MB/s ± 4% ~ (p=0.195 n=17+18) BM_Parse_Proto2<FileDesc, NoArena, Copy> 310MB/s ± 4% 309MB/s ± 6% ~ (p=0.767 n=18+18) BM_Parse_Proto2<FileDesc, UseArena, Copy> 646MB/s ± 4% 649MB/s ± 4% ~ (p=0.621 n=18+16) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 666MB/s ± 3% 666MB/s ± 3% ~ (p=0.743 n=18+18) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 592MB/s ± 7% 593MB/s ± 4% ~ (p=0.988 n=18+19) BM_SerializeDescriptor_Proto2 1.30GB/s ± 5% 1.32GB/s ± 5% ~ (p=0.433 n=17+17) BM_SerializeDescriptor_Upb 756MB/s ± 5% 745MB/s ± 6% ~ (p=0.102 n=19+16) ``` PiperOrigin-RevId: 520144430
2 years ago
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
bool upb_Arena_Fuse(upb_Arena* a1, upb_Arena* a2) {
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
if (a1 == a2) return true; // trivial fuse
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
// Do not fuse initial blocks since we cannot lifetime extend them.
// Any other fuse scenario is allowed.
if (upb_Arena_HasInitialBlock(a1) || upb_Arena_HasInitialBlock(a2)) {
return false;
}
// The number of refs we ultimately need to transfer to the new root.
uintptr_t ref_delta = 0;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
while (true) {
Restructure fuse a tiny bit for aesthetics * make internal functions start with `_` * make internal functions static * pull invariant tests to the very top name old cpu/op new cpu/op delta BM_ArenaOneAlloc 17.7ns ± 9% 17.9ns ±10% ~ (p=0.790 n=16+17) BM_ArenaInitialBlockOneAlloc 5.22ns ± 3% 5.63ns ±13% ~ (p=0.068 n=16+20) BM_ArenaFuseUnbalanced/2 68.4ns ± 3% 66.4ns ± 3% -3.05% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 515ns ± 3% 539ns ±16% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.73µs ± 4% 4.71µs ± 4% ~ (p=0.444 n=16+17) BM_ArenaFuseUnbalanced/128 9.67µs ± 5% 9.59µs ± 6% ~ (p=0.217 n=17+16) BM_ArenaFuseBalanced/2 68.3ns ± 4% 66.9ns ± 5% -2.00% (p=0.020 n=18+18) BM_ArenaFuseBalanced/8 525ns ± 4% 519ns ± 3% ~ (p=0.128 n=16+16) BM_ArenaFuseBalanced/64 5.03µs ±17% 4.77µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±17% 9.6µs ± 3% ~ (p=0.058 n=20+16) name old time/op new time/op delta BM_ArenaOneAlloc 17.7ns ± 8% 17.9ns ±10% ~ (p=0.736 n=16+17) BM_ArenaInitialBlockOneAlloc 5.23ns ± 3% 5.66ns ±15% ~ (p=0.077 n=16+20) BM_ArenaFuseUnbalanced/2 68.6ns ± 3% 66.5ns ± 3% -3.07% (p=0.000 n=16+16) BM_ArenaFuseUnbalanced/8 516ns ± 3% 542ns ±18% ~ (p=0.496 n=18+20) BM_ArenaFuseUnbalanced/64 4.74µs ± 4% 4.72µs ± 4% ~ (p=0.423 n=16+17) BM_ArenaFuseUnbalanced/128 9.69µs ± 5% 9.61µs ± 7% ~ (p=0.191 n=17+16) BM_ArenaFuseBalanced/2 68.5ns ± 4% 67.1ns ± 5% -1.99% (p=0.022 n=18+18) BM_ArenaFuseBalanced/8 526ns ± 4% 520ns ± 3% ~ (p=0.157 n=16+16) BM_ArenaFuseBalanced/64 5.04µs ±18% 4.79µs ± 6% ~ (p=0.109 n=20+16) BM_ArenaFuseBalanced/128 10.2µs ±18% 9.7µs ± 3% ~ (p=0.062 n=20+16) PiperOrigin-RevId: 520541220
2 years ago
upb_Arena* new_root = _upb_Arena_DoFuse(a1, a2, &ref_delta);
if (new_root != NULL && _upb_Arena_FixupRefs(new_root, ref_delta)) {
return true;
Allow fuse/fuse races, so that upb_Arena is fully thread-compatible. Previously upb_Arena was not thread-compatible when `upb_Arena_Fuse(a, b)` and `upb_Arena_Fuse(c, d)` executed in parallel if `b` and `c` were previously fused. This CL fixed that by allowing `upb_Arena_Fuse()` to run in parallel without limitations. Details on the design of the algorithm are captured in comments. The CL slightly improves the performance of `upb_Arena_Fuse()`. ``` name old cpu/op new cpu/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.5ns ± 4% -12.30% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.65ns ± 4% 5.17ns ± 3% -22.23% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.1ns ± 7% 68.5ns ± 4% ~ (p=0.327 n=18+19) BM_ArenaFuseUnbalanced/8 542ns ± 3% 513ns ± 4% -5.25% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.04µs ± 8% 4.74µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.80% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 71.8ns ± 7% 68.4ns ± 6% -4.75% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 541ns ± 3% 519ns ± 3% -4.21% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.00µs ± 7% 4.86µs ± 4% -2.78% (p=0.003 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 4% 9.7µs ± 4% -2.68% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.52ms ± 2% 5.54ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.18ms ± 3% 6.15ms ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.8ms ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.2µs ± 4% 12.3µs ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.1µs ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.6µs ±16% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 4% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.782 n=18+17) BM_SerializeDescriptor_Proto2 5.69µs ± 5% 5.76µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old time/op new time/op delta BM_ArenaOneAlloc 20.0ns ±19% 17.6ns ± 4% -12.37% (p=0.000 n=19+17) BM_ArenaInitialBlockOneAlloc 6.66ns ± 4% 5.18ns ± 3% -22.24% (p=0.000 n=18+17) BM_ArenaFuseUnbalanced/2 69.2ns ± 7% 68.6ns ± 4% ~ (p=0.343 n=18+19) BM_ArenaFuseUnbalanced/8 543ns ± 3% 515ns ± 4% -5.21% (p=0.000 n=18+18) BM_ArenaFuseUnbalanced/64 5.05µs ± 8% 4.75µs ± 4% -5.93% (p=0.000 n=17+17) BM_ArenaFuseUnbalanced/128 10.1µs ± 4% 9.6µs ± 4% -4.78% (p=0.000 n=18+17) BM_ArenaFuseBalanced/2 72.0ns ± 7% 68.6ns ± 6% -4.73% (p=0.000 n=17+17) BM_ArenaFuseBalanced/8 543ns ± 3% 520ns ± 3% -4.20% (p=0.000 n=18+17) BM_ArenaFuseBalanced/64 5.01µs ± 7% 4.87µs ± 4% -2.78% (p=0.004 n=17+18) BM_ArenaFuseBalanced/128 10.0µs ± 3% 9.8µs ± 4% -2.67% (p=0.001 n=16+18) BM_LoadAdsDescriptor_Upb<NoLayout> 5.53ms ± 2% 5.56ms ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 6.20ms ± 3% 6.17ms ± 2% ~ (p=0.424 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 11.8ms ± 7% 11.7ms ± 5% ~ (p=0.297 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 11.9ms ± 3% 11.9ms ± 3% ~ (p=0.351 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 12.3µs ± 4% 12.3µs ± 4% ~ (p=1.000 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 11.3µs ± 6% 11.3µs ± 3% ~ (p=0.845 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 12.1µs ± 4% 12.1µs ± 3% ~ (p=0.542 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 11.1µs ± 4% 11.2µs ± 2% ~ (p=0.330 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 24.2µs ± 3% 25.7µs ±17% ~ (p=0.167 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 11.6µs ± 3% 11.7µs ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 11.5µs ± 7% 11.4µs ± 4% ~ (p=0.799 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 12.8µs ± 5% 13.0µs ±14% ~ (p=0.807 n=18+17) BM_SerializeDescriptor_Proto2 5.71µs ± 5% 5.78µs ± 6% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 10.2µs ± 4% 10.2µs ± 3% ~ (p=0.613 n=18+17) name old allocs/op new allocs/op delta BM_ArenaOneAlloc 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseUnbalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/2 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/8 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/64 64.0 ± 0% 64.0 ± 0% ~ (all samples are equal) BM_ArenaFuseBalanced/128 128 ± 0% 128 ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<NoLayout> 6.05k ± 0% 6.05k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Upb<WithLayout> 6.36k ± 0% 6.36k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<NoLayout> 83.4k ± 0% 83.4k ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 84.4k ± 0% 84.4k ± 0% -0.00% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Upb_FileDesc<UseArena, Alias> 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, NoArena, Copy> 765 ± 0% 765 ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 8.00 ± 0% 8.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_ArenaOneAlloc 336 ± 0% 328 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseUnbalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/2 672 ± 0% 656 ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/8 2.69k ± 0% 2.62k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/64 21.5k ± 0% 21.0k ± 0% -2.38% (p=0.000 n=20+20) BM_ArenaFuseBalanced/128 43.0k ± 0% 42.0k ± 0% -2.38% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<NoLayout> 10.0M ± 0% 9.9M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Upb<WithLayout> 10.0M ± 0% 10.0M ± 0% -0.05% (p=0.000 n=20+20) BM_LoadAdsDescriptor_Proto2<NoLayout> 6.62M ± 0% 6.62M ± 0% ~ (all samples are equal) BM_LoadAdsDescriptor_Proto2<WithLayout> 6.66M ± 0% 6.66M ± 0% -0.01% (p=0.013 n=19+20) BM_Parse_Upb_FileDesc<UseArena, Copy> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Upb_FileDesc<UseArena, Alias> 36.5k ± 0% 36.5k ± 0% -0.02% (p=0.000 n=20+20) BM_Parse_Proto2<FileDesc, NoArena, Copy> 35.8k ± 0% 35.8k ± 0% ~ (all samples are equal) BM_Parse_Proto2<FileDesc, UseArena, Copy> 65.3k ± 0% 65.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_LoadAdsDescriptor_Upb<NoLayout> 137MB/s ± 2% 137MB/s ± 4% ~ (p=0.707 n=16+19) BM_LoadAdsDescriptor_Upb<WithLayout> 122MB/s ± 3% 123MB/s ± 3% ~ (p=0.501 n=18+18) BM_LoadAdsDescriptor_Proto2<NoLayout> 64.2MB/s ± 7% 64.7MB/s ± 5% ~ (p=0.330 n=16+18) BM_LoadAdsDescriptor_Proto2<WithLayout> 63.6MB/s ± 3% 63.9MB/s ± 3% ~ (p=0.303 n=18+17) BM_Parse_Upb_FileDesc<UseArena, Copy> 614MB/s ± 4% 613MB/s ± 4% ~ (p=0.935 n=17+18) BM_Parse_Upb_FileDesc<UseArena, Alias> 665MB/s ± 6% 667MB/s ± 3% ~ (p=0.873 n=16+17) BM_Parse_Upb_FileDesc<InitBlock, Copy> 624MB/s ± 4% 622MB/s ± 3% ~ (p=0.501 n=18+18) BM_Parse_Upb_FileDesc<InitBlock, Alias> 681MB/s ± 4% 675MB/s ± 2% ~ (p=0.297 n=18+16) BM_Parse_Proto2<FileDesc, NoArena, Copy> 311MB/s ± 3% 296MB/s ±15% ~ (p=0.177 n=17+20) BM_Parse_Proto2<FileDesc, UseArena, Copy> 649MB/s ± 3% 644MB/s ± 3% ~ (p=0.232 n=17+18) BM_Parse_Proto2<FileDesc, InitBlock, Copy> 656MB/s ± 7% 659MB/s ± 4% ~ (p=0.707 n=18+19) BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 587MB/s ± 5% 576MB/s ±16% ~ (p=0.584 n=18+18) BM_SerializeDescriptor_Proto2 1.32GB/s ± 5% 1.31GB/s ± 7% ~ (p=0.143 n=18+18) BM_SerializeDescriptor_Upb 737MB/s ± 4% 737MB/s ± 7% ~ (p=0.839 n=18+18) ``` PiperOrigin-RevId: 520452349
2 years ago
}
}
}
bool upb_Arena_IncRefFor(upb_Arena* arena, const void* owner) {
_upb_ArenaRoot r;
if (upb_Arena_HasInitialBlock(arena)) return false;
retry:
r = _upb_Arena_FindRoot(arena);
if (upb_Atomic_CompareExchangeWeak(
&r.root->parent_or_count, &r.tagged_count,
_upb_Arena_TaggedFromRefcount(
_upb_Arena_RefCountFromTagged(r.tagged_count) + 1),
memory_order_release, memory_order_acquire)) {
// We incremented it successfully, so we are done.
return true;
}
// We failed update due to parent switching on the arena.
goto retry;
}
void upb_Arena_DecRefFor(upb_Arena* arena, const void* owner) {
upb_Arena_Free(arena);
}