Changes the data layout of tables slightly so that string
keys are prefixed with their size, rather than the size
being inline in the table itself.
This has a few benefits:
1. inttables shrink a bit, because there is no longer a wasted
and unused size field sitting in them.
2. This avoids the need to have a union in the table. This is
important for an impending C89 port of upb, since C89 has
literally no way of statically initializing a non-first union
member.
- rewritten decoder; interpreted decoder is bytecode-based,
JIT decoder no longer falls back to the interpreter.
- C++ improvements: C++11-compatible iterators, upb::reffed_ptr
for RAII refcounting, better upcast/downcast support.
- removed the gross upb_value abstraction from public upb.h.
Major changes:
- Got rid of all bytestream interfaces in favor of
using regular handlers.
- new Pipeline object represents a upb pipeline, does
bump allocation internally to manage memory.
- proto2 support now can handle extensions.
Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).
Many improvements, too many to mention. One significant
perf regression warrants investigation:
omitfp.parsetoproto2_googlemessage1.upb_jit: 343 -> 252 (-26.53)
plain.parsetoproto2_googlemessage1.upb_jit: 334 -> 251 (-24.85)
25% regression for this benchmark is bad, but since I don't think
there's any fundamental design issue that caused it I'm going to
go ahead with the commit anyway. Can investigate and fix later.
Other benchmarks were neutral or showed slight improvement.
Includes are now via upb/foo.h.
Files specific to the protobuf format are
now in upb/pb (the core library is concerned
with message definitions, handlers, and
byte streams, but knows nothing about any
particular serializationf format).
upb_inttable() now supports a "compact" operation that will
decide on an array size and put all entries with small enough
keys into the array part for faster lookup.
Also exposed the upb_itof_ent structure and put a few useful
values there, so they are one fewer pointer chase away.
The cost is that a upb_msg will now always have an overhead
of 2*sizeof(void*). This is comparable to proto2 overhead.
The benefit is that upb_msg is now self-describing, and
read-only algorithms can now operate on a upb_msg regardless
of the memory-management scheme.
Also, upb_array and upb_string now know inherently if they
own their associated memory, and upb_array has a generic
pointer for memory management purposes like upb_msg does.