Also includes an implementation of the conformance tests
to display what the API usage will be like.
There is still a lot to do, and things that are broken (oneofs,
repeated fields, etc), but it's a good start.
It is entirely optional: MessageDef/EnumDef can still exist
on their own. But this can represent a def's file when it is
desirable to do so (eg. for code generators).
This approach will require that we change the way we handle
extensions. But I think it will be a good change overall.
Specifically, we previously handled extensions by duplicating
the extended message and then adding the extension as a regular
field to the duplicated message. This required also duplicating
any messages that could reach the extended message.
In the new world we will need a way of declaring and looking up
extensions separately from the message being extended.
This change also involves some notable changes to the generated
code:
- files are now called foo.upbdefs.h instead of foo.upb.h.
This reflects the fact that we might possibly generate several
different output files for a .proto file, and this one is just
for defs.
- we no longer generate selectors in the .h file.
- the upbdefs.c no longer vends a SymbolTable. Now it vends the
individual messages (and possibly a FileDef later). I think this
will compose better once we can generate files where one
generated files imports another.
We also make the descriptor reader vend a list of FileDefs now.
This is the best conceptual match for parsing a FileDescriptorSet.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.
Changes the data layout of tables slightly so that string
keys are prefixed with their size, rather than the size
being inline in the table itself.
This has a few benefits:
1. inttables shrink a bit, because there is no longer a wasted
and unused size field sitting in them.
2. This avoids the need to have a union in the table. This is
important for an impending C89 port of upb, since C89 has
literally no way of statically initializing a non-first union
member.
- rewritten decoder; interpreted decoder is bytecode-based,
JIT decoder no longer falls back to the interpreter.
- C++ improvements: C++11-compatible iterators, upb::reffed_ptr
for RAII refcounting, better upcast/downcast support.
- removed the gross upb_value abstraction from public upb.h.
- Better error reporting for upb::Def setters.
- error reporting for upb::Handlers setters.
- made the start/endmsg handlers a little less special-cased.
Major changes:
- Got rid of all bytestream interfaces in favor of
using regular handlers.
- new Pipeline object represents a upb pipeline, does
bump allocation internally to manage memory.
- proto2 support now can handle extensions.
Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).