There's already path compression which guarantees amortized fast times (halving the cost of subsequent lookups, alas not the inverse ackermann), but there's still no need to redo work and acquire/release atomics the whole way along the path. This also takes advantage of the fast-path relaxed-only read for querying the root of a root node.
PiperOrigin-RevId: 712770023
* Add acquire/release where necessary for all atomic ops
* Add sentinel member to ensure safe publication when tsan is active; tsan will not catch the previous errors without this member.
* For all operations using relaxed memory order, comment why relaxed order is safe
* Add a test that exercises racy fuses and space allocated checks without mutexes or other memory barriers from the test harness. This test proved the existence of several races not caught by the existing tests, including one with a confident comment about why relaxed memory order was safe.
* Add a test that exercises racing allocation and destruction among fused arenas, which doesn't use locks and substitutes a custom allocator that verifies its memory blocks.
Test coverage and assert/tsan instrumentation is now sufficient to cause test failures if any call site is further relaxed.
PiperOrigin-RevId: 712751905
This fixes:
* MSVC with `/std:c11 /experimental:c11atomics` on recent versions now emits atomics
* Clang with `-std=c11 -fgnuc-version=0` now emits atomics
* Clang and GCC 14 when built with `-std=c99 -pedantic-errors` will now compile, and not emit atomics
PiperOrigin-RevId: 712538312
We no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the `next` pointers did not use release semantics and the reading of them in `SpaceAllocated` reads with relaxed order, there's no guarantee that `size` has actually been initialized - but worse, *there is also no guarantee that `next` has been!*. Simplified:
```
AddBlock:
1 ptr = malloc();
2 ptr->size = 123;
3 ptr->next = ai->blocks;
4 ai->blocks = ptr (release order);
```
```
SpaceAllocated:
5 block = ai->blocks (relaxed order)
6 block->size (acquire, but probably by accident)
7 block = block->next (relaxed order)
```
So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce.
In reality, operation 4 will produce an `stlr` on arm (forcing an order of 1, 2, 3 before 4), and `block->next` has a data dependency on `ai->blocks` which would force an ordering in the hardware between 5->6 and 5->7 even for regular `ldr` instructions.
Delete arena contains, it's private and the only user is its own test.
PiperOrigin-RevId: 709918443
Previously, extensions and unknown fields were stored on opposite ends of a growing buffer:
```
|------unknown fields-------|---------unallocated space------|--extensions---|
```
Unknown fields were appended and extensions were prepended during parse. When either side ran into the other, the buffer was reallocated to fit, rounding up to the nearest power of 2. This meant that for a proto with 70,000 bytes of unknown fields, the total memory consumed could be up to 128+256+512+1024+2048+4096+8192+16384+32768+65536+131072=262016 bytes allocated in the arena. In the more common case of a large, length-delimited field it'd be just 131072 bytes; but as a 3.74x increase or a 1.87x increase, that's a lot of extra memory.
The new representation still does exponential reallocation, but only for pointers to normal arena allocations. We exploit the fact that arena allocations are aligned to store data about whether the pointer is to an extension or a `upb_StringView` of unknown fields in the low bits of the pointer itself. This costs three pointers of overhead per unknown field and one pointer of overhead per extension, but that's a fixed overhead - we won't over-allocate large buffers for large unknown fields. If this overhead proves to be a problem, more compact representations could be implemented.
In addition, because unknown field bytes are now in their own allocations, they are pointer stable - in the future, this will allow us to exploit aliasing (when enabled during parse) for both unknown fields and lazy extensions (parsed from unknown fields), which can greatly reduce memory use for messages with a lot of unknown, string, or bytes fields.
PiperOrigin-RevId: 708058272
Rename upb_Log2CeilingSize() to upb_RoundUpToPowerOfTwo() to minimize the chance of this confusion happening in the future.
In practice this condition could never be hit by descriptors generated by protoc since FeatureSet is small and 1-byte-on-wire fields.
PiperOrigin-RevId: 702381123
Some qsort implementations will allocate buffers rather than sorting purely in-place; this new algorithm avoids that and also works in O(n) time rather than O(nlogn).
PiperOrigin-RevId: 702081436
Repeated/map extensions are semantically equivalent to an extension that is not present at all. We had code paths that were treating them differently, which led to incorrect results.
In particular, we were considering `{.repeated_ext = []}` to be different from `{}` when comparing with `upb_Message_IsEqual()`. This change fixes this bug so that they will be considered equivalent.
PiperOrigin-RevId: 702072912
Before, the upb_ExtensionRegistry_AddArray API would just return a boolean
indicating whether the operation succeeded or failed. This is not descriptive
enough in some cases and made the error output confusing.
Specifically, when trying to register an extension with a duplicate extension
number, AddArray first performs a map lookup before inserting the extension
entry into the registry. The code handled lookup failure (due to duplicates)
the same way as insertion failure (due to OOM), and printed an error message
that showed OOM when there is a duplicate array entry.
This was acknolwedged in a TODO in the AddArray code comment -- which is now
fixed. :)
PiperOrigin-RevId: 700764584
We previously hit this for NAN, but it appears the latest version of MSVC used in windows github runners has the same issue for INFINITY. The same strategy can be applied here.
This will be backported to fix broken release branches.
PiperOrigin-RevId: 700042372
After using UPB I noticed that field accessors were not getting inlined properly using Clang. The source appears to be the UPB_FORCEINLINE macro falling back to just being `static` when underneath clang despite clang having full support of the required attributes.
Closes#17433
COPYBARA_INTEGRATE_REVIEW=https://github.com/protocolbuffers/protobuf/pull/17433 from Vadmeme:main 57204e78cd
PiperOrigin-RevId: 699195477
Before we were trying to work around the fact that we don't know the default depth limit. The logic is simpler and more robust if we take the default into account.
PiperOrigin-RevId: 698856552
Prior to this CL, the fuzz tests only checked that the code does not crash, but it was not checking any correctness properties. This CL adds correctness checking, verifying that we can round trip through the wire format without losing or corrupting data.
This highlighted a minor bug in the encoder where the depth limit check was off by one (too strict). This CL makes the encoder more accepting when checking the depth limit.
PiperOrigin-RevId: 698527429