* Put oneofs in the same table as fields.
Oneofs and fields are not allowed to have names that conflict,
so we might as well put them all in the same table. This also
allows an efficient operation that looks for both fields and
oneofs in a single lookup.
Added support for OneofDef to Lua to allow testing of this.
* Addressed PR comments.
* Split upb::Arena/upb::Allocator from upb::Environment.
This will allow arenas and allocators to be used
independently of environments, which will be important
for an upcoming change (a message representation).
Overall this design feels cleaner that the previous
Environment/SeededAllocator design.
As part of this change, moved all allocations in upb
to use a global allocator instead of hard-coding
malloc/free. This will allow injecting OOM faults
for more robust testing.
One place that doesn't use the global allocator is
the tracked ref code. Instead of its previous approach
of CHECK_OOM() after every malloc() or table insert, it
simply uses an allocator that does this automatically.
I moved Allocator/Arena/Environment into upb.h.
This seems principled since these are the only types
in upb whose size is directly exposed to users, since
they form the basis of memory allocation strategy.
* Cleaned up some header includes and fixed more malloc -> upb_gmalloc().
* Changes from PR review.
* Don't use UINTPTR_MAX or UINT64_MAX.
* Punt on adding line/file for now.
* We actually can't store (uint64_t)-1, update comment and test.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.
Changes the data layout of tables slightly so that string
keys are prefixed with their size, rather than the size
being inline in the table itself.
This has a few benefits:
1. inttables shrink a bit, because there is no longer a wasted
and unused size field sitting in them.
2. This avoids the need to have a union in the table. This is
important for an impending C89 port of upb, since C89 has
literally no way of statically initializing a non-first union
member.
This change adds support for a OneofDef (upb_oneofdef), which represents
a 'oneof' as introduced by Protocol Buffers. This is semantically a
union type that contains fields and in turn may be added to a
MessageDef. This change does not alter parsing or the handler
abstraction in any way, because a oneof has impact only at a higher
semantic level (i.e., any sort of storage of the fields in a message
object), which is user-specific with respect to upb.
There are a number of tweaks to get this to work:
- The #include dependence graph wasn't quite complete, and I had to add
a few #includes to get the tool to work.
- I had to change a number of symbol names to avoid conflicts between
'static' definitions in different .c files. This could be avoided if
the tool were smart enough to rename static symbols to have unique
prefixes instead, but (i) this requires semantic understanding of C,
and (ii) the macro-defined static functions (e.g., handlers for
primitive types in several places) would probably trip this up.
Verified that the resulting upb.h/upb.c compiles and doesn't have any
unresolved references.
- rewritten decoder; interpreted decoder is bytecode-based,
JIT decoder no longer falls back to the interpreter.
- C++ improvements: C++11-compatible iterators, upb::reffed_ptr
for RAII refcounting, better upcast/downcast support.
- removed the gross upb_value abstraction from public upb.h.
Major changes:
- Got rid of all bytestream interfaces in favor of
using regular handlers.
- new Pipeline object represents a upb pipeline, does
bump allocation internally to manage memory.
- proto2 support now can handle extensions.
Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).
Many improvements, too many to mention. One significant
perf regression warrants investigation:
omitfp.parsetoproto2_googlemessage1.upb_jit: 343 -> 252 (-26.53)
plain.parsetoproto2_googlemessage1.upb_jit: 334 -> 251 (-24.85)
25% regression for this benchmark is bad, but since I don't think
there's any fundamental design issue that caused it I'm going to
go ahead with the commit anyway. Can investigate and fix later.
Other benchmarks were neutral or showed slight improvement.
Added a upb_byteregion that tracks a region of
the input buffer; decoders use this instead of
using a upb_bytesrc directly. upb_byteregion
is also used as the way of passing a string to
a upb_handlers callback. This symmetry makes
decoders compose better; if you want to take
a parsed string and decode it as something else,
you can take the string directly from the callback
and feed it as input to another parser.
A commented-out version of a pinning interface
is present; I decline to actually implement it
(and accept its extra complexity) until/unless
it is clear that it is actually a win. But it
is included as a proof-of-concept, to show that
it fits well with the existing interface.
Includes are now via upb/foo.h.
Files specific to the protobuf format are
now in upb/pb (the core library is concerned
with message definitions, handlers, and
byte streams, but knows nothing about any
particular serializationf format).