This is a sync of our internal developing of JSON parsing and
serialization. It implements native understanding of MapEntry
submessages, so that map fields with (key, value) pairs are serialized
as JSON maps (objects) natively rather than as arrays of objects with
'key' and 'value' fields. The parser also now understands how to emit
handler calls corresponding to MapEntry objects when processing a map
field.
This sync also picks up a bugfix in `table.c` to handle an alloc-failed
case.
This change adds support for a OneofDef (upb_oneofdef), which represents
a 'oneof' as introduced by Protocol Buffers. This is semantically a
union type that contains fields and in turn may be added to a
MessageDef. This change does not alter parsing or the handler
abstraction in any way, because a oneof has impact only at a higher
semantic level (i.e., any sort of storage of the fields in a message
object), which is user-specific with respect to upb.
system. The Ruby module build now uses an amalgamated distribution of
upb, and successfully builds a Ruby gem called 'google-protobuf' with
module 'google/protobuf'.
There are a number of tweaks to get this to work:
- The #include dependence graph wasn't quite complete, and I had to add
a few #includes to get the tool to work.
- I had to change a number of symbol names to avoid conflicts between
'static' definitions in different .c files. This could be avoided if
the tool were smart enough to rename static symbols to have unique
prefixes instead, but (i) this requires semantic understanding of C,
and (ii) the macro-defined static functions (e.g., handlers for
primitive types in several places) would probably trip this up.
Verified that the resulting upb.h/upb.c compiles and doesn't have any
unresolved references.
This adds a Ruby extension in ruby/ that is based on the 'upb' library
(now included as a submodule), and adds support for Ruby code generation
to the protoc compiler.
- Added a JSON test that round-trips (parses then re-serializes) several
test messages, ensuring that the re-serialized form matches the
original exactly.
- Added support for printing and parsing symbolic enum names (rather than
integer values) in JSON.
- Updated JSON printer to properly handle string fields that come in
multiple pieces. ('bytes' fields still do not support this, and this
work is more challenging because it requires making the base64 encoder
resumable. Base64 encoding is not separable at an input-byte
granularity, unlike string escaping.)
- Fixed a < vs. <= bug in UTF-8 encoding generation (oops).
Notable changes:
- We now only build things by default that require
no dependencies. So you can build upb even if you
don't have Lua or Google protobuf installed.
- Checked in a pre-built version of the JIT, so you
don't need Lua installed at build time to run DynASM.
It will still notice if you change the .dasc file and
attempt to re-run DynASM in that case.
- The build system now builds all modules of upb into
separate libraries, reflecting the modularity that
is already inherent in upb's design. This should
make it easier to trim the fat.
- removed the GDB JIT interface. I wasn't using it
much; using a .so is easier and more robust.
- rewritten decoder; interpreted decoder is bytecode-based,
JIT decoder no longer falls back to the interpreter.
- C++ improvements: C++11-compatible iterators, upb::reffed_ptr
for RAII refcounting, better upcast/downcast support.
- removed the gross upb_value abstraction from public upb.h.
- Better error reporting for upb::Def setters.
- error reporting for upb::Handlers setters.
- made the start/endmsg handlers a little less special-cased.
Major changes:
- Got rid of all bytestream interfaces in favor of
using regular handlers.
- new Pipeline object represents a upb pipeline, does
bump allocation internally to manage memory.
- proto2 support now can handle extensions.
Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).
Many improvements, too many to mention. One significant
perf regression warrants investigation:
omitfp.parsetoproto2_googlemessage1.upb_jit: 343 -> 252 (-26.53)
plain.parsetoproto2_googlemessage1.upb_jit: 334 -> 251 (-24.85)
25% regression for this benchmark is bad, but since I don't think
there's any fundamental design issue that caused it I'm going to
go ahead with the commit anyway. Can investigate and fix later.
Other benchmarks were neutral or showed slight improvement.