- A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders.
# Schema/Defs Changes
- Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum.
- Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type.
# Decoder
- Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage.
# JSON Decoder
- Allowed well-known types at the top level to have their special processing.
- Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`.
- Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0.
# JSON Encoder
- Allowed well-known types at the top level to have their special processing.
- Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests.
# Message / Reflection
- Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set.
- Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`.
- Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm.
# Compiler
- Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.
* Created amalgamation with upb_msg but no handlers.
* Bugfix for upb_array_resize().
* Renamed "lite" amalgamation to "core", to avoid confusion.
Traditionally "lite" has meant "without reflection", but here we
mean it as "without handlers-based code."
* Build fixes from CI tests.
* Removed some more C++-style comments.
* Fix for out-of-order statements.
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```
* WIP, first version of encoder.
* More progress on text encoder.
* A lot of progress on the text printer.
* Added textencode header file.
* Text encoder now passes conformance tests.
These aren't very stringent though, and more testing is needed.
* Print text into static buffer. Passes all conformance tests.
* Fixed kokoro errors.
* Fix for indent depth when printing map fields.
* Removed reflection and other extraneous things from the core library.
* Added missing files and ran buildifier.
* New CMakeLists.txt.
* Made table its own cc_library() for internal usage.
upb_msg was trying to be general enough that it could either live in
an arena or be allocated with malloc()/free(). This was too much
complexity for too little benefit. We should commit to just saying
that upb_msg is arena-only.
I also ripped out the code to glue upb_msg to the existing
handlers-based encoder/decoder. upb_msg has its own, small, simple
encoder/decoder. I'm trying to whittle down upb_msg to a small
and simple core.
I updated the Lua extension for these changes. Lua needs some more
work to properly create arenas per message. For now I just created
a single global arena.
Also includes an implementation of the conformance tests
to display what the API usage will be like.
There is still a lot to do, and things that are broken (oneofs,
repeated fields, etc), but it's a good start.
This involves:
- remove upb_msglayout -> upb_msgfactory dependency.
- remove upb_msglayout -> upb_msgdef dependency (in progress).
- make upb_msglayout use a representation that can be
statically initialized by generated code.
The goal here is that upb_msglayout becomes a kind of "descriptor
lite": it contains enough data to parser and serialize protobufs
and manipulate a upb_msg in memory, while being far smaller and
simpler than a full descriptor. It also does not include field
names, which can be a benefit for applications that do not want
to leak field names.
Generated code can then create a upb_msglayout, and do most things
without ever needing to construct full descriptors/defs if they
don't want to.