Ref: https://github.com/protocolbuffers/protobuf/pull/10291
Ruby types defined though native extensions should register
a function that report their memory footprint in bytes.
This feature is used by various memory profiling tools.
Internal array functions are now implemented in upb/internal/array.c and declared in
upb/internal/array.h, which only has local visibility.
PiperOrigin-RevId: 458260144
Lots of changes but it's all just moving things around.
Backward-compatible stub #include's have been provided for now.
upb_Arena/upb_Status have been split out from upb/upb.?
upb_Array/upb_Map/upb_MessageValue have been split out from upb/collections.?
upb_ExtensionRegistry has been split out from upb/msg.?
upb/decode_internal.h is now upb/internal/decode.h
upb/mini_table_accessors_internal.h is now upb/internal/mini_table_accessors.h
upb/table_internal.h is now upb/internal/table.h
upb/upb_internal.h is now upb/internal/upb.h
PiperOrigin-RevId: 456297617
upb has traditionally returned 16-byte-aligned pointers from arena allocation. This was out of an abundance of caution, since users could theoretically be using upb arenas to allocate memory that is then used for SSE/AVX values (eg. [`__m128`](https://docs.microsoft.com/en-us/cpp/cpp/m128?view=msvc-170), which require 16-byte alignment.
In practice, the protobuf C++ arena has used 8-byte alignment for 8 years with no significant problems I know of arising from SSE etc.
Reducing the alignment requirement to 8 will save memory. It will also help with compatibility on 32-bit architectures where `malloc()` only returns 8-byte aligned memory. The immediate motivation is to fix the win32 build for Python protobuf.
PiperOrigin-RevId: 448331777
If an initial block is provided, we should start our
block doubling at the size of the initial block, not 128.
This saves us from unnecessary overhead when we overflow
the initial block.
upb previously attempted to support C89 and pre-2015 versions
of Visual Studio. This was to support older compilers with
limited C99 support (particularly MSVC). But as of last August,
even gRPC has dropped support for MSVC prior to 2015
c87276d058
Therefore it seems safe for upb to no longer attempt C89 support
(we were already not truly C89 compliant, with our use of "bool").
We now explicitly require C99 or greater and MSVC 2015 or greater.
This cleaned up port_def.inc a fair bit. I took the chance to
also remove some obsolete macros.
* Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors.
* Disable -Wshorten-32-to-64 since Kokoro is missing Clang.
* Fixed -Wextra warnings for gcc.
* Reordered UPB_UNUSED() to come after declarations.
* Added another -pedantic fix and log CC version.
* Fix compile error and conditionally run use_bazel.sh.
* Moved set -e after use_bazel.sh.
* Fixed typo in conditional.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Re-applied Lua changes.
* Implemented upb_arena_fuse().
* Fix the compile by re-ordering statements.
* Improve comments.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Fixed compile errors.
* Fixed compile error and changed benchmarks to do one allocation.
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```
* Split upb::Arena/upb::Allocator from upb::Environment.
This will allow arenas and allocators to be used
independently of environments, which will be important
for an upcoming change (a message representation).
Overall this design feels cleaner that the previous
Environment/SeededAllocator design.
As part of this change, moved all allocations in upb
to use a global allocator instead of hard-coding
malloc/free. This will allow injecting OOM faults
for more robust testing.
One place that doesn't use the global allocator is
the tracked ref code. Instead of its previous approach
of CHECK_OOM() after every malloc() or table insert, it
simply uses an allocator that does this automatically.
I moved Allocator/Arena/Environment into upb.h.
This seems principled since these are the only types
in upb whose size is directly exposed to users, since
they form the basis of memory allocation strategy.
* Cleaned up some header includes and fixed more malloc -> upb_gmalloc().
* Changes from PR review.
* Don't use UINTPTR_MAX or UINT64_MAX.
* Punt on adding line/file for now.
* We actually can't store (uint64_t)-1, update comment and test.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.