_upb_Arena_FastMalloc() has been directly inlined
Arena::allocator() has been removed
mem_block is now _upb_MemBlock
some calls to upb_Arena_Alloc() (which is going away) have been changed to use upb_Arena_Malloc() instead
PiperOrigin-RevId: 489874377
Ref: https://github.com/protocolbuffers/protobuf/pull/10291
Ruby types defined though native extensions should register
a function that report their memory footprint in bytes.
This feature is used by various memory profiling tools.
Internal array functions are now implemented in upb/internal/array.c and declared in
upb/internal/array.h, which only has local visibility.
PiperOrigin-RevId: 458260144
Lots of changes but it's all just moving things around.
Backward-compatible stub #include's have been provided for now.
upb_Arena/upb_Status have been split out from upb/upb.?
upb_Array/upb_Map/upb_MessageValue have been split out from upb/collections.?
upb_ExtensionRegistry has been split out from upb/msg.?
upb/decode_internal.h is now upb/internal/decode.h
upb/mini_table_accessors_internal.h is now upb/internal/mini_table_accessors.h
upb/table_internal.h is now upb/internal/table.h
upb/upb_internal.h is now upb/internal/upb.h
PiperOrigin-RevId: 456297617
upb has traditionally returned 16-byte-aligned pointers from arena allocation. This was out of an abundance of caution, since users could theoretically be using upb arenas to allocate memory that is then used for SSE/AVX values (eg. [`__m128`](https://docs.microsoft.com/en-us/cpp/cpp/m128?view=msvc-170), which require 16-byte alignment.
In practice, the protobuf C++ arena has used 8-byte alignment for 8 years with no significant problems I know of arising from SSE etc.
Reducing the alignment requirement to 8 will save memory. It will also help with compatibility on 32-bit architectures where `malloc()` only returns 8-byte aligned memory. The immediate motivation is to fix the win32 build for Python protobuf.
PiperOrigin-RevId: 448331777
If an initial block is provided, we should start our
block doubling at the size of the initial block, not 128.
This saves us from unnecessary overhead when we overflow
the initial block.
upb previously attempted to support C89 and pre-2015 versions
of Visual Studio. This was to support older compilers with
limited C99 support (particularly MSVC). But as of last August,
even gRPC has dropped support for MSVC prior to 2015
c87276d058
Therefore it seems safe for upb to no longer attempt C89 support
(we were already not truly C89 compliant, with our use of "bool").
We now explicitly require C99 or greater and MSVC 2015 or greater.
This cleaned up port_def.inc a fair bit. I took the chance to
also remove some obsolete macros.
* Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors.
* Disable -Wshorten-32-to-64 since Kokoro is missing Clang.
* Fixed -Wextra warnings for gcc.
* Reordered UPB_UNUSED() to come after declarations.
* Added another -pedantic fix and log CC version.
* Fix compile error and conditionally run use_bazel.sh.
* Moved set -e after use_bazel.sh.
* Fixed typo in conditional.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Re-applied Lua changes.
* Implemented upb_arena_fuse().
* Fix the compile by re-ordering statements.
* Improve comments.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Fixed compile errors.
* Fixed compile error and changed benchmarks to do one allocation.
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```