Unfortunately this degrades hash table lookup performance by
about 8%, which affects the streaming benchmark for googlemessage1
by about 5%. We could get this back at the cost of some memory,
but it would be nice to avoid that.
Sources and sinks communicate by means of a
upb_handlers object, which encapsulates a set of
handler callbacks and will possibly offer richer
semantics in the future like giving specific
fields different callbacks.
The upb_handlers protocol supports delegation, so
sets of handlers can be written in reusable ways.
For example, if a set of handlers is written to
handle a specific .proto type, those handlers can
be used whether that type is at the top level or
whether it is a sub-message of a higher-level type.
Delegation allows the streaming protocol to
properly compose.
The "field" entry was only being used to determine
whether we were inside a group, but the "end_offset"
member contains enough information to tell us that.
This should make it both easier to use and easier to
optimize, in exchange for a small amount of generality.
In practice, any remotely normal case is still very
natural.
The cost is that a upb_msg will now always have an overhead
of 2*sizeof(void*). This is comparable to proto2 overhead.
The benefit is that upb_msg is now self-describing, and
read-only algorithms can now operate on a upb_msg regardless
of the memory-management scheme.
Also, upb_array and upb_string now know inherently if they
own their associated memory, and upb_array has a generic
pointer for memory management purposes like upb_msg does.