A couple weeks ago we moved upb into the protobuf Git repo, and this change
continues the merger of the two repos by making them into a single Bazel repo.
This was mostly a matter of deleting upb's WORKSPACE file and fixing up a bunch
of references to reflect the new structure.
Most of the changes are pretty mechanical, but one thing that needed more
invasive changes was the Python script for generating CMakeLists.txt,
make_cmakelists.py. The WORKSPACE file it relied on no longer exists with this
change, so I updated it to hardcode the information it needed from that file.
PiperOrigin-RevId: 564810016
This is the second attempt to fix our Git history. This should allow
"git blame" to work correctly in the upb/ directory even though our
automation unexpectedly blew away that directory.
upb_msg was trying to be general enough that it could either live in
an arena or be allocated with malloc()/free(). This was too much
complexity for too little benefit. We should commit to just saying
that upb_msg is arena-only.
I also ripped out the code to glue upb_msg to the existing
handlers-based encoder/decoder. upb_msg has its own, small, simple
encoder/decoder. I'm trying to whittle down upb_msg to a small
and simple core.
I updated the Lua extension for these changes. Lua needs some more
work to properly create arenas per message. For now I just created
a single global arena.
This involves:
- remove upb_msglayout -> upb_msgfactory dependency.
- remove upb_msglayout -> upb_msgdef dependency (in progress).
- make upb_msglayout use a representation that can be
statically initialized by generated code.
The goal here is that upb_msglayout becomes a kind of "descriptor
lite": it contains enough data to parser and serialize protobufs
and manipulate a upb_msg in memory, while being far smaller and
simpler than a full descriptor. It also does not include field
names, which can be a benefit for applications that do not want
to leak field names.
Generated code can then create a upb_msglayout, and do most things
without ever needing to construct full descriptors/defs if they
don't want to.
* Put oneofs in the same table as fields.
Oneofs and fields are not allowed to have names that conflict,
so we might as well put them all in the same table. This also
allows an efficient operation that looks for both fields and
oneofs in a single lookup.
Added support for OneofDef to Lua to allow testing of this.
* Addressed PR comments.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.
Many things have changed and been simplified.
The memory-management story for upb_def and upb_handlers
is much more robust; upb_def and upb_handlers should be
fairly stable interfaces now. There is still much work
to do for the runtime component (upb_sink).