Unfortunately a few of the Clang warnings did not have easy fixes:
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘fastdecode_err’:
../../../../ext/google/protobuf_c/ruby-upb.c:353:13: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]
353 | const char *fastdecode_err(upb_decstate *d) {
| ^~~~~~~~~~~~~~
../../../../ext/google/protobuf_c/ruby-upb.c: In function ‘_upb_decode’:
../../../../ext/google/protobuf_c/ruby-upb.c:867:30: warning: argument ‘buf’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
867 | bool _upb_decode(const char *buf, size_t size, void *msg,
I even tried to suppress the first error, but it still shows up.
There was a bug in our arena code where we assumed that
sizeof(upb_array) would be a multiple of 8. On i386 it was
not, and this was causing memory corruption on 32-bit builds.
* Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors.
* Disable -Wshorten-32-to-64 since Kokoro is missing Clang.
* Fixed -Wextra warnings for gcc.
* Reordered UPB_UNUSED() to come after declarations.
* Added another -pedantic fix and log CC version.
* Fix compile error and conditionally run use_bazel.sh.
* Moved set -e after use_bazel.sh.
* Fixed typo in conditional.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Re-applied Lua changes.
* Implemented upb_arena_fuse().
* Fix the compile by re-ordering statements.
* Improve comments.
* WIP.
* WIP.
* Tests are passing.
* Recover some perf: LIKELY doesn't propagate through functions. :(
* Added some more benchmarks.
* Simplify & optimize upb_arena_realloc().
* Only add owned blocks to the freelist.
* More optimization/simplification.
* Re-fixed the bug.
* Revert unintentional changes to parser.rl.
* Revert Lua changes for now.
* Revert the arena fuse changes for now.
* Added last_size to the arena representation.
* Fixed compile errors.
* Fixed compile error and changed benchmarks to do one allocation.
This makes both the C (.h) and C++ (.hpp) files read nicer
and keeps the core of upb C-only.
Existing users of the C++ wrappers will have to add manual
#includes of the .hpp files.
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```
* Removed reflection and other extraneous things from the core library.
* Added missing files and ran buildifier.
* New CMakeLists.txt.
* Made table its own cc_library() for internal usage.
* Split upb::Arena/upb::Allocator from upb::Environment.
This will allow arenas and allocators to be used
independently of environments, which will be important
for an upcoming change (a message representation).
Overall this design feels cleaner that the previous
Environment/SeededAllocator design.
As part of this change, moved all allocations in upb
to use a global allocator instead of hard-coding
malloc/free. This will allow injecting OOM faults
for more robust testing.
One place that doesn't use the global allocator is
the tracked ref code. Instead of its previous approach
of CHECK_OOM() after every malloc() or table insert, it
simply uses an allocator that does this automatically.
I moved Allocator/Arena/Environment into upb.h.
This seems principled since these are the only types
in upb whose size is directly exposed to users, since
they form the basis of memory allocation strategy.
* Cleaned up some header includes and fixed more malloc -> upb_gmalloc().
* Changes from PR review.
* Don't use UINTPTR_MAX or UINT64_MAX.
* Punt on adding line/file for now.
* We actually can't store (uint64_t)-1, update comment and test.