Nothing reads or writes this data yet, but we do implement the
memory management that allows both unknown field data and extensions
to grow within the same pseudo-arena in a message. By making both
arrays grow towards each other, we avoid the need to reallocate them
separately.
The primary motivation for this change is to avoid referring to the
`upb_msglayout` object when we are trying to fetch the `upb_msglayout`
object for a sub-message. This will help pave the way for parsing
extensions. We also implement several optimizations so that we can
make this change without regressing performance.
Normally we compute the layout for a sub-message field like so:
```
const upb_msglayout *get_submsg_layout(
const upb_msglayout *layout,
const upb_msglayout_field *field) {
return layout->submsgs[field->submsg_index]
}
```
The reason for this indirection is to avoid storing a pointer directly
in `upb_msglayout_field`, as this would double its size (from 12 to 24
bytes on 64-bit architectures) which is wasteful as this pointer is
only needed for message typed fields.
However `get_submsg_layout` as written above does not work for
extensions, as they will not have entries in the message's
`layout->submsgs` array by nature, and we want to avoid creating
an entire fake `upb_msglayout` for each such extension since that
would also be wasteful.
This change removes the dependency on `upb_msglayout` by passing down
the `submsgs` array instead:
```
const upb_msglayout *get_submsg_layout(
const upb_msglayout *const *submsgs,
const upb_msglayout_field *field) {
return submsgs[field->submsg_index]
}
```
This will pave the way for parsing extensions, as we can more easily
create an alternative `submsgs` array for extension fields without
extra overhead or waste.
Along the way several optimizations presented themselves that allow
a nice increase in performance:
1. Passing the parsed `wireval` by address instead of by value ended
up avoiding an expensive and useless stack copy (this is on Clang,
which was used for all measurements).
2. When field numbers are densely packed, we can find a field by number
with a single indexed lookup instead of linear search. At codegen
time we can compute the maximum field number that will allow such
an indexed lookup.
3. For fields that do require linear search, we can start the linear
search at the location where we found the previous field, taking
advantage of the fact that field numbers are generally increasing.
4. When the hasbit index is less than 32 (the common case) we can use
a less expensive code sequence to set it.
5. We check for the hasbit case before the oneof case, as optional
fields are more common than oneof fields.
Benchmark results indicate a 20% improvement in parse speed with a
small code size increase:
```
name old time/op new time/op delta
ArenaOneAlloc 21.3ns ± 0% 21.5ns ± 0% +0.96% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.32ns ± 0% 6.32ns ± 0% +0.03% (p=0.000 n=12+10)
LoadDescriptor_Upb 53.5µs ± 1% 51.5µs ± 2% -3.70% (p=0.000 n=12+12)
LoadAdsDescriptor_Upb 2.78ms ± 2% 2.68ms ± 0% -3.57% (p=0.000 n=12+12)
LoadDescriptor_Proto2 240µs ± 0% 240µs ± 0% +0.12% (p=0.001 n=12+12)
LoadAdsDescriptor_Proto2 12.8ms ± 0% 12.7ms ± 0% -1.15% (p=0.000 n=12+10)
Parse_Upb_FileDesc<UseArena,Copy> 13.2µs ± 2% 10.7µs ± 0% -18.49% (p=0.000 n=10+12)
Parse_Upb_FileDesc<UseArena,Alias> 11.3µs ± 0% 9.6µs ± 0% -15.11% (p=0.000 n=12+11)
Parse_Upb_FileDesc<InitBlock,Copy> 12.7µs ± 0% 10.3µs ± 0% -19.00% (p=0.000 n=10+12)
Parse_Upb_FileDesc<InitBlock,Alias> 10.9µs ± 0% 9.2µs ± 0% -15.82% (p=0.000 n=12+12)
Parse_Proto2<FileDesc,NoArena,Copy> 29.4µs ± 0% 29.5µs ± 0% +0.61% (p=0.000 n=12+12)
Parse_Proto2<FileDesc,UseArena,Copy> 20.7µs ± 2% 20.6µs ± 2% ~ (p=0.260 n=12+11)
Parse_Proto2<FileDesc,InitBlock,Copy> 16.7µs ± 1% 16.7µs ± 0% -0.25% (p=0.036 n=12+10)
Parse_Proto2<FileDescSV,InitBlock,Alias> 16.5µs ± 0% 16.5µs ± 0% +0.20% (p=0.016 n=12+11)
SerializeDescriptor_Proto2 5.30µs ± 1% 5.36µs ± 1% +1.09% (p=0.000 n=12+11)
SerializeDescriptor_Upb 12.9µs ± 0% 13.0µs ± 0% +0.90% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+1.5% +176 +1.6% +176 upb/decode.c
+1.8% +176 +1.9% +176 decode_msg
+0.4% +64 +0.4% +64 upb/def.c
+1.4% +64 +1.4% +64 _upb_symtab_addfile
+1.2% +48 +1.4% +48 upb/reflection.c
+15% +32 +18% +32 upb_msg_set
+2.9% +16 +3.1% +16 upb_msg_mutable
-9.3% -288 [ = ] 0 [Unmapped]
[ = ] 0 +0.2% +288 TOTAL
```
There was a bug in our arena code where we assumed that
sizeof(upb_array) would be a multiple of 8. On i386 it was
not, and this was causing memory corruption on 32-bit builds.
Until everyone can regenerate their code, we need to provide
compatible semantics with the old generated code.
Also fixed a bug where enums were allocated 8 bytes instead
of 4.
* Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors.
* Disable -Wshorten-32-to-64 since Kokoro is missing Clang.
* Fixed -Wextra warnings for gcc.
* Reordered UPB_UNUSED() to come after declarations.
* Added another -pedantic fix and log CC version.
* Fix compile error and conditionally run use_bazel.sh.
* Moved set -e after use_bazel.sh.
* Fixed typo in conditional.
- A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders.
# Schema/Defs Changes
- Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum.
- Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type.
# Decoder
- Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage.
# JSON Decoder
- Allowed well-known types at the top level to have their special processing.
- Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`.
- Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0.
# JSON Encoder
- Allowed well-known types at the top level to have their special processing.
- Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests.
# Message / Reflection
- Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set.
- Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`.
- Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm.
# Compiler
- Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.
* Created amalgamation with upb_msg but no handlers.
* Bugfix for upb_array_resize().
* Renamed "lite" amalgamation to "core", to avoid confusion.
Traditionally "lite" has meant "without reflection", but here we
mean it as "without handlers-based code."
* Build fixes from CI tests.
* Removed some more C++-style comments.
* Fix for out-of-order statements.
* WIP.
* Passes most tests.
* A few fixes.
* A few optimizations.
* Some more optimiation.
* Update Protobuf to v3.11.4 and Abseil to LTS 2020-02-25
* Use longjmp instead of explicit error checks at every level.
* Used macros for better documentation of ops.
* Fixed bug with map parsing. All tests are passing except a few conformance tests.
* Fixed remaining bugs, all conformance tests pass.
Also ported all of upb to a single UPB_PTR_AT() macro instead of
having multiple .c files define their own.
* Formatted with clang-format.
* Fixes to compile on Linux.
* A few more compile fixes.
* Script to benchmark changes.
* Fixed parenthesis bug in op calculation.
* Updated generated descriptor files.
* WIP.
* Removed trailing enum to fix the Linux build.
* Respect packed=false to fix conformance failures in new protobuf version.
* Small simplification.
* Fixes to decoder.
* Removed stray comment.
Co-authored-by: Yannic Bonenberger <contact@yannic-bonenberger.com>
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```