Before there was a single code generator that generated both
.upb and .upbdefs, even though they are generated by different
rules. This worked fine as long as the codegen steps were
sandboxed, but if not it led to build errors.
Fixes https://github.com/protocolbuffers/upb/issues/354.
1. For long tags we were putting table entries in the wrong slot.
2. For repeated strings, when the buffer flipped to no longer alias we
were failing to notice and kept aliasing anyway.
Until everyone can regenerate their code, we need to provide
compatible semantics with the old generated code.
Also fixed a bug where enums were allocated 8 bytes instead
of 4.
parsers, to a tail recursive function. Removed the varint jmp table for
a simple varint parse loop, that removes the stack frames. Also careful
with not losing information in repeated message tag check. When written
mindful the checks and loads that happen can be reused for tag dispatch
if not the expected tag.
* Added -Wextra and -Wshorten-64-to-32 and fixed resulting errors.
* Disable -Wshorten-32-to-64 since Kokoro is missing Clang.
* Fixed -Wextra warnings for gcc.
* Reordered UPB_UNUSED() to come after declarations.
* Added another -pedantic fix and log CC version.
* Fix compile error and conditionally run use_bazel.sh.
* Moved set -e after use_bazel.sh.
* Fixed typo in conditional.
- A new PHP-specific upb amalgamation. It contains everything related to upb_msg, but leaves out all of the old handlers-related interfaces and encoders/decoders.
# Schema/Defs Changes
- Changed `upb_fielddef_msgsubdef()` and `upb_fielddef_enumsubdef()` to return `NULL` instead of assert-failing if the field is not a message or enum.
- Added `upb_msgdef_iswrapper()`, to test whether this is a wrapper well-known type.
# Decoder
- Decoder bugfix: when we parse a submessage inside a oneof, we need to clear out any previous data, so we don't misinterpret it as a pointer to an existing submessage.
# JSON Decoder
- Allowed well-known types at the top level to have their special processing.
- Fixed a bug that could occur when parsing nested empty lists/objects, eg `[[]]`.
- Made the "ignore unknown" option also be permissive about unknown enumerators by setting them to 0.
# JSON Encoder
- Allowed well-known types at the top level to have their special processing.
- Removed all spaces after `:` and `,` characters, to match the old encoder and pass goldenfile tests.
# Message / Reflection
- Changed `upb_msg_hasoneof()` -> `upb_msg_whichoneof()`. The new function returns the `upb_fielddef*` of whichever oneof is set.
- Implemented `upb_msg_clearfield()` and added/implemented `upb_msg_clear()`.
- Added `upb_msg_discardunknown()`. Part of me thinks this should go in a util library instead of core reflection since it is a recursive algorithm.
# Compiler
- Always emit descriptors as an array instead of as a string, to avoid exceeding maximum string lengths. If this becomes a speed issue later we can go back to two separate paths.
* Added support for proto3 optional to defs.
* Added proto3 optional support. Untested at the moment.
* Changes to support proto3 optional.
* Fixed real oneof count for messages with no fields.
* Fixed compile error and test.
* Added comment about why I'm commenting out the assert.
* WIP.
* Passes most tests.
* A few fixes.
* A few optimizations.
* Some more optimiation.
* Update Protobuf to v3.11.4 and Abseil to LTS 2020-02-25
* Use longjmp instead of explicit error checks at every level.
* Used macros for better documentation of ops.
* Fixed bug with map parsing. All tests are passing except a few conformance tests.
* Fixed remaining bugs, all conformance tests pass.
Also ported all of upb to a single UPB_PTR_AT() macro instead of
having multiple .c files define their own.
* Formatted with clang-format.
* Fixes to compile on Linux.
* A few more compile fixes.
* Script to benchmark changes.
* Fixed parenthesis bug in op calculation.
* Updated generated descriptor files.
* WIP.
* Removed trailing enum to fix the Linux build.
* Respect packed=false to fix conformance failures in new protobuf version.
* Small simplification.
* Fixes to decoder.
* Removed stray comment.
Co-authored-by: Yannic Bonenberger <contact@yannic-bonenberger.com>
New code is smaller (in both source size and compiled size) and faster.
# Speed
The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.
### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux
name old time/op new time/op delta
CreateArena 4.72ns ± 0% 4.93ns ± 0% +4.47% (p=0.000 n=11+11)
ParseDescriptor 12.4µs ± 1% 9.1µs ± 1% -26.65% (p=0.000 n=11+11)
```
### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS
name old time/op new time/op delta
CreateArena 5.33ns ± 3% 5.58ns ± 2% +4.69% (p=0.000 n=12+12)
ParseDescriptor 15.0µs ± 2% 11.9µs ± 2% -20.20% (p=0.000 n=12+12)
```
### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux
name old time/op new time/op delta
CreateArena 5.29ns ± 0% 5.52ns ± 0% +4.37% (p=0.000 n=10+12)
ParseDescriptor 18.6µs ± 0% 16.4µs ± 0% -11.54% (p=0.000 n=12+12)
```
# Size
A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:
```
VM SIZE
--------------
+2.1% +283 upb/json_decode.c
+24% +205 upb/msg.c
+8.4% +115 upb/upb.c
+0.9% +28 upb/reflection.c
[ = ] 0 upb/def.c
[ = ] 0 upb/encode.c
[ = ] 0 upb/json_encode.c
[ = ] 0 upb/table.c
-30.3% -1.51Ki upb/decode.c
-0.7% -738 TOTAL
```