This leads to a major (20-40%) improvement in the parsetoproto2
benchmark with small messages. We now are faster than proto2 in all
apples-to-apples comparisons, at least given the (admittedly
limited) set of benchmarks in this source tree.
Includes are now via upb/foo.h.
Files specific to the protobuf format are
now in upb/pb (the core library is concerned
with message definitions, handlers, and
byte streams, but knows nothing about any
particular serializationf format).
I'm realizing that basically all upb objects
will need to be refcounted to be sharable
across languages, but *not* messages which
are on their way out so we can get out of
the business of data representations.
Things which must be refcounted:
- encoders, decoders
- handlers objects
- defs
Startseq/endseq handlers are called at the beginning
and end of a sequence of repeated values. Protobuf
does not really have direct support for this (repeated
primitive fields do not delimit "begin" and "end" of
the sequence) but we can infer them from the bytestream.
The benefit of supporting them explicitly is that they
get their own stack frame and closure, so we can avoid
having to find the array's address over and over and
deciding if we need to initialize it.
This will also pave the way for better support of JSON,
which does have explicit "startseq/endseq" markers: [].
It can successfully parse SpeedMessage1.
Preliminary results: 750MB/s on Core2 2.4GHz.
This number is 2.5x proto2.
This isn't apples-to-apples, because
proto2 is parsing to a struct and we are
just doing stream parsing, but for apps
that are currently using proto2, this is the
improvement they would see if they could
move to stream-based processing.
Unfortunately perf-regression-test.py is
broken, and I'm not 100% sure why. It would
be nice to fix it first (to ensure that
there are no performance regressions for
the table-based decoder) but I'm really
impatient to get the JIT checked in.
Simplified some of the semantics around
the decoder's data structures, in anticipation
of sharing them between the regular C decoder
and a JIT-ted decoder.
This doesn't reflect any material change in
how I will be working on upb, and I have no
problem making this change. It's still open
source under the BSD license, and I'll still
be working on it well beyond the hours that
constitute a normal job.
This is a significant change to the upb_stream
protocol, and should hopefully be the last
significant change.
All callbacks are now registered ahead-of-time
instead of having delegated callbacks registered
at runtime, which makes it much easier to
aggressively optimize ahead-of-time (like with a
JIT).
Other impacts of this change:
- You no longer need to have loaded descriptor.proto
as a upb_def to load other descriptors! This means
the special-case code we used for bootstrapping is
no longer necessary, and we no longer need to link
the descriptor for descriptor.proto into upb.
- A client can now register any upb_value as what
will be delivered to their value callback, not
just a upb_fielddef*. This should allow for other
clients to get more bang out of the streaming
decoder.
This change unfortunately causes a bit of a performance
regression -- I think largely due to highly
suboptimal code that GCC generates when structs
are returned by value. See:
http://blog.reverberate.org/2011/03/19/when-a-compilers-slow-code-actually-bites-you/
On the other hand, once we have a JIT this should
no longer matter.
Performance numbers:
plain.parsestream_googlemessage1.upb_table: 374 -> 396 (5.88)
plain.parsestream_googlemessage2.upb_table: 616 -> 449 (-27.11)
plain.parsetostruct_googlemessage1.upb_table_byref: 268 -> 269 (0.37)
plain.parsetostruct_googlemessage1.upb_table_byval: 215 -> 204 (-5.12)
plain.parsetostruct_googlemessage2.upb_table_byref: 307 -> 281 (-8.47)
plain.parsetostruct_googlemessage2.upb_table_byval: 297 -> 272 (-8.42)
omitfp.parsestream_googlemessage1.upb_table: 423 -> 410 (-3.07)
omitfp.parsestream_googlemessage2.upb_table: 679 -> 483 (-28.87)
omitfp.parsetostruct_googlemessage1.upb_table_byref: 287 -> 282 (-1.74)
omitfp.parsetostruct_googlemessage1.upb_table_byval: 226 -> 219 (-3.10)
omitfp.parsetostruct_googlemessage2.upb_table_byref: 315 -> 298 (-5.40)
omitfp.parsetostruct_googlemessage2.upb_table_byval: 297 -> 287 (-3.37)
* UPB_STOP -> UPB_BREAK, better represents breaking
out of a parsing loop.
* UPB_STATUS_OK -> UPB_OK, for all status codes, more
concise at no readability cost (perhaps an improvement).
1. the start and end callbacks can now return
a upb_flow_t and set a status message.
2. clarified some semantics around passing an
error status back from the callbacks.
Sources and sinks communicate by means of a
upb_handlers object, which encapsulates a set of
handler callbacks and will possibly offer richer
semantics in the future like giving specific
fields different callbacks.
The upb_handlers protocol supports delegation, so
sets of handlers can be written in reusable ways.
For example, if a set of handlers is written to
handle a specific .proto type, those handlers can
be used whether that type is at the top level or
whether it is a sub-message of a higher-level type.
Delegation allows the streaming protocol to
properly compose.
Unfortunately my previous detailed commit message was
lost somehow by git or vi. Will have to explain in more
detail at a later date the rationale for this change.
The build will be broken until I port the old decoder
to this new interface.