It is entirely optional: MessageDef/EnumDef can still exist
on their own. But this can represent a def's file when it is
desirable to do so (eg. for code generators).
This approach will require that we change the way we handle
extensions. But I think it will be a good change overall.
Specifically, we previously handled extensions by duplicating
the extended message and then adding the extension as a regular
field to the duplicated message. This required also duplicating
any messages that could reach the extended message.
In the new world we will need a way of declaring and looking up
extensions separately from the message being extended.
This change also involves some notable changes to the generated
code:
- files are now called foo.upbdefs.h instead of foo.upb.h.
This reflects the fact that we might possibly generate several
different output files for a .proto file, and this one is just
for defs.
- we no longer generate selectors in the .h file.
- the upbdefs.c no longer vends a SymbolTable. Now it vends the
individual messages (and possibly a FileDef later). I think this
will compose better once we can generate files where one
generated files imports another.
We also make the descriptor reader vend a list of FileDefs now.
This is the best conceptual match for parsing a FileDescriptorSet.
It is clear now that oneofs are different form other
defs in one important way: they are not top-level
constructs that can stand on their own in a SymbolTable,
for example. If you are iterating over a list of Defs
in a SymbolTable, there is no chance you will run into
a oneof. To reflect this reality, OneofDef no longer
derives from Def, and the UPB_DEF_ONEOF is no longer
an enum value that needs to be handled.
https://google.github.io/styleguide/cppguide.html#Constructor_Initializer_Lists
```
// If the signature and initializer list are not all on one line,
// you must wrap before the colon and indent 4 spaces:
MyClass::MyClass(int var)
: some_var_(var), some_other_var_(var + 1) {
DoSomething();
}
```
and
```
// As with any other code block, the close curly can be on the same
// line as the open curly, if it fits.
MyClass::MyClass(int var)
: some_var_(var) {}
```
$ make Q= googlepb
g++ -O3 -std=c++98 -pedantic -Wno-long-long -Wall -Wextra -Wpointer-arith -Wno-unused-private-field -I. -DNDEBUG -c -o obj/upb/bindings/googlepb/bridge.o upb/bindings/googlepb/bridge.cc
In file included from ./upb/handlers.h:22,
from ./upb/bindings/googlepb/bridge.h:42,
from upb/bindings/googlepb/bridge.cc:8:
./upb/def.h: In constructor ‘upb::Pointer<upb::Def>::Pointer(upb::Def*)’:
./upb/def.h:39: error: class ‘upb::Pointer<upb::Def>’ does not have any field named ‘PointerBase’
./upb/def.h:39: error: no matching function for call to ‘upb::PointerBase<upb::Def, upb::RefCounted>::PointerBase()’
./upb/upb.h:246: note: candidates are: upb::PointerBase<T, Base>::PointerBase(T*) [with T = upb::Def, Base = upb::RefCounted]
./upb/upb.h:244: note: upb::PointerBase<upb::Def, upb::RefCounted>::PointerBase(const upb::PointerBase<upb::Def, upb::RefCounted>&)
...
the generated code looks like:
template <> class Pointer<upb::Def> : public PointerBase<upb::Def, upb::RefCounted> {
public: explicit Pointer(upb::Def* ptr) : PointerBase(ptr) {}
};
..
which falls into https://gcc.gnu.org/bugzilla/show_bug.cgi?id=189
(http://stackoverflow.com/questions/8887864/template-base-constructor-call-in-member-initialization-list-error)
changing the generated code to:
template <> class Pointer<upb::Def> : public PointerBase<upb::Def, upb::RefCounted> {
public: explicit Pointer(upb::Def* ptr) : PointerBase<upb::Def, upb::RefCounted>(ptr) {}
};
makes it compile at least on 4.4.7 that we are testing with:
$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
Prior to this change, if an error was returned, it would be
guaranteed to always return a short byte count. Now the two
concepts are a bit more orthogonal. There are cases where
the entire input is consumed even though an error was
encountered.
Prior to this change:
parse(buf, len) -> len + N
...would indicate that the next N bytes of the input are not
needed, *and* would advance the decoding position by this
much.
After this change:
parse(buf, len) -> len + N
parse(NULL, N) -> N
...can be used to achieve the same thing. But skipping the
N bytes is not explicitly performed by the user. A user that
doesn't want/need to skip can just say:
parsed = parse(buf, len);
if (parsed < len) {
// Handle suspend, advance stream by "parsed".
} else {
// Stream was advanced by "len" (even if parsed > len).
}
Updated unit tests to test this new behavior, and refactored
test utility code a bit to support it.
This was previously broken -- it would try to set
the status object on the parser, but the pointer
was never initialized. Also it didn't report
errors properly to the environment object.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.
Changes the data layout of tables slightly so that string
keys are prefixed with their size, rather than the size
being inline in the table itself.
This has a few benefits:
1. inttables shrink a bit, because there is no longer a wasted
and unused size field sitting in them.
2. This avoids the need to have a union in the table. This is
important for an impending C89 port of upb, since C89 has
literally no way of statically initializing a non-first union
member.
Also added a separate ndebug build for testing that
-DNDEBUG builds still work.
Also disabled reference debugging by default, since it
requires either a global lock or -DUPB_THREAD_UNSAFE.
This change has several parts:
1. Resurrected tools/upbc. The code was all there but the build was
broken for open-source. Now you can type "make tools/upbc" and
it will build all necessary Lua modules and create a robust shell
script for running upbc.
2. Changed Lua module loading to no longer rely on OS-level .so
dependencies. The net effect of this is that you now only need
to set LUA_PATH and LUA_CPATH; setting LD_LIBRARY_PATH or rpaths
is no longer required. Downside: this drops compatibility with
Lua 5.1, since it depends on a feature that only exists in Lua >=5.2
(and LuaJIT).
3. Since upbc works again, I fixed the re-generation of the descriptor
files (descriptor.upb.h, descriptor.upb.c). "make genfiles" will
re-generate these as well as the JIT code generator.
4. Added a Travis test target that ensures that the checked-in generated
files are not out of date. I would do this for the Ragel generated
file also, but we can't count on all versions of Ragel to necessarily
generate identical output.
5. Changed Makefile to no longer automatically run Ragel to regenerate
the JSON parser. This is unfortuante, because it's convenient when
you're developing the JSON parser. However, "git clone" sometimes
skews the timestamps a little bit so that "make" thinks it needs to
regenerate these files for a fresh "git clone." This would normally
be harmless, but if the user doesn't have Ragel installed, it makes
the build fail completely. So now you have to explicitly regenerate
the Ragel output. If you want to you can uncomment the auto-generation
during development.