Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).
Added a upb_byteregion that tracks a region of
the input buffer; decoders use this instead of
using a upb_bytesrc directly. upb_byteregion
is also used as the way of passing a string to
a upb_handlers callback. This symmetry makes
decoders compose better; if you want to take
a parsed string and decode it as something else,
you can take the string directly from the callback
and feed it as input to another parser.
A commented-out version of a pinning interface
is present; I decline to actually implement it
(and accept its extra complexity) until/unless
it is clear that it is actually a win. But it
is included as a proof-of-concept, to show that
it fits well with the existing interface.
Includes are now via upb/foo.h.
Files specific to the protobuf format are
now in upb/pb (the core library is concerned
with message definitions, handlers, and
byte streams, but knows nothing about any
particular serializationf format).
It can successfully parse SpeedMessage1.
Preliminary results: 750MB/s on Core2 2.4GHz.
This number is 2.5x proto2.
This isn't apples-to-apples, because
proto2 is parsing to a struct and we are
just doing stream parsing, but for apps
that are currently using proto2, this is the
improvement they would see if they could
move to stream-based processing.
Unfortunately perf-regression-test.py is
broken, and I'm not 100% sure why. It would
be nice to fix it first (to ensure that
there are no performance regressions for
the table-based decoder) but I'm really
impatient to get the JIT checked in.
This doesn't reflect any material change in
how I will be working on upb, and I have no
problem making this change. It's still open
source under the BSD license, and I'll still
be working on it well beyond the hours that
constitute a normal job.
This should make it both easier to use and easier to
optimize, in exchange for a small amount of generality.
In practice, any remotely normal case is still very
natural.
Callers must always over-allocate their buffer by at least
ten bytes. Since we will never read *more* than ten bytes,
there is no need to do bounds checking inside the parsing
code.