1. For decoding, an unknownfields will be lazily created on message,
which contains bytes of unknown fields.
2. For encoding, if the unknownfields is present on message, all bytes
contained in it will be serialized.
* Split upb::Arena/upb::Allocator from upb::Environment.
This will allow arenas and allocators to be used
independently of environments, which will be important
for an upcoming change (a message representation).
Overall this design feels cleaner that the previous
Environment/SeededAllocator design.
As part of this change, moved all allocations in upb
to use a global allocator instead of hard-coding
malloc/free. This will allow injecting OOM faults
for more robust testing.
One place that doesn't use the global allocator is
the tracked ref code. Instead of its previous approach
of CHECK_OOM() after every malloc() or table insert, it
simply uses an allocator that does this automatically.
I moved Allocator/Arena/Environment into upb.h.
This seems principled since these are the only types
in upb whose size is directly exposed to users, since
they form the basis of memory allocation strategy.
* Cleaned up some header includes and fixed more malloc -> upb_gmalloc().
* Changes from PR review.
* Don't use UINTPTR_MAX or UINT64_MAX.
* Punt on adding line/file for now.
* We actually can't store (uint64_t)-1, update comment and test.
It is entirely optional: MessageDef/EnumDef can still exist
on their own. But this can represent a def's file when it is
desirable to do so (eg. for code generators).
This approach will require that we change the way we handle
extensions. But I think it will be a good change overall.
Specifically, we previously handled extensions by duplicating
the extended message and then adding the extension as a regular
field to the duplicated message. This required also duplicating
any messages that could reach the extended message.
In the new world we will need a way of declaring and looking up
extensions separately from the message being extended.
This change also involves some notable changes to the generated
code:
- files are now called foo.upbdefs.h instead of foo.upb.h.
This reflects the fact that we might possibly generate several
different output files for a .proto file, and this one is just
for defs.
- we no longer generate selectors in the .h file.
- the upbdefs.c no longer vends a SymbolTable. Now it vends the
individual messages (and possibly a FileDef later). I think this
will compose better once we can generate files where one
generated files imports another.
We also make the descriptor reader vend a list of FileDefs now.
This is the best conceptual match for parsing a FileDescriptorSet.
Prior to this change, if an error was returned, it would be
guaranteed to always return a short byte count. Now the two
concepts are a bit more orthogonal. There are cases where
the entire input is consumed even though an error was
encountered.
Prior to this change:
parse(buf, len) -> len + N
...would indicate that the next N bytes of the input are not
needed, *and* would advance the decoding position by this
much.
After this change:
parse(buf, len) -> len + N
parse(NULL, N) -> N
...can be used to achieve the same thing. But skipping the
N bytes is not explicitly performed by the user. A user that
doesn't want/need to skip can just say:
parsed = parse(buf, len);
if (parsed < len) {
// Handle suspend, advance stream by "parsed".
} else {
// Stream was advanced by "len" (even if parsed > len).
}
Updated unit tests to test this new behavior, and refactored
test utility code a bit to support it.