This change moves almost everything in the `upb/` directory up one level, so
that for example `upb/upb/generated_code_support.h` becomes just
`upb/generated_code_support.h`. The only exceptions I made to this were that I
left `upb/cmake` and `upb/BUILD` where they are, mostly because that avoids
conflict with other files and the current locations seem reasonable for now.
The `python/` directory is a little bit of a challenge because we had to merge
the existing directory there with `upb/python/`. I made `upb/python/BUILD` into
the BUILD file for the merged directory, and it effectively loads the contents
of the other BUILD file via `python/build_targets.bzl`, but I plan to clean
this up soon.
PiperOrigin-RevId: 568651768
A couple weeks ago we moved upb into the protobuf Git repo, and this change
continues the merger of the two repos by making them into a single Bazel repo.
This was mostly a matter of deleting upb's WORKSPACE file and fixing up a bunch
of references to reflect the new structure.
Most of the changes are pretty mechanical, but one thing that needed more
invasive changes was the Python script for generating CMakeLists.txt,
make_cmakelists.py. The WORKSPACE file it relied on no longer exists with this
change, so I updated it to hardcode the information it needed from that file.
PiperOrigin-RevId: 564810016
This is the second attempt to fix our Git history. This should allow
"git blame" to work correctly in the upb/ directory even though our
automation unexpectedly blew away that directory.
This matches an API already present in proto2
(const DescriptorPool* FileDescriptor::pool()).
However there is a slightly subtle implication here.
In proto2, the relationship between Descriptor and
MessageFactory is 1:many. You can create as many
DynamicMessageFactory instances as you want, and
each one will have its own independent DynamicMessage
prototype and computed layout for the same underlying
Descriptor. In practice the layouts will all be the same,
but one thing that could be distinct is that each can
have its own extension pool, which is a DescriptorPool
that will be searched for extensions when parsing.
In contrast, upb does not have a separate "message
factory" abstraction. That means that each upb_msgdef
has a single distinct layout, in other words a 1:1
correspondence between descriptor and layout. This means
that there is no way to create multiple message types
for the same descriptor that have distinct extension
pools. If you want a different set of extensions, you
must create a separate upb_symtab with a distinct set
of descriptors.
This change further entrenches that upb_filedef:upb_symtab
is a 1:1 relationship. A single upb_filedef cannot be a
member of multiple symbol tables. In practice this was
already true (there is no way to add a single filedef to
multiple symbol tables) but this change codifies this
1:1 relationship.
Also includes an implementation of the conformance tests
to display what the API usage will be like.
There is still a lot to do, and things that are broken (oneofs,
repeated fields, etc), but it's a good start.
* Put oneofs in the same table as fields.
Oneofs and fields are not allowed to have names that conflict,
so we might as well put them all in the same table. This also
allows an efficient operation that looks for both fields and
oneofs in a single lookup.
Added support for OneofDef to Lua to allow testing of this.
* Addressed PR comments.
* Changed schema for JSON test to be defined in a .proto file.
Before we had lots of code to build these schemas manually,
but this was verbose and made it difficult to add to the
schema easily. Now we can just write a .proto file and
adding fields is easy.
To avoid making the tests depend on upbc (and thus Lua)
we check in the generated schema.
* Made protobuf-compiler a dependency of "make genfiles."
* For genfiles download recent protoc that can handle proto3.
* Only use new protoc for genfiles.
It is entirely optional: MessageDef/EnumDef can still exist
on their own. But this can represent a def's file when it is
desirable to do so (eg. for code generators).
This approach will require that we change the way we handle
extensions. But I think it will be a good change overall.
Specifically, we previously handled extensions by duplicating
the extended message and then adding the extension as a regular
field to the duplicated message. This required also duplicating
any messages that could reach the extended message.
In the new world we will need a way of declaring and looking up
extensions separately from the message being extended.
This change also involves some notable changes to the generated
code:
- files are now called foo.upbdefs.h instead of foo.upb.h.
This reflects the fact that we might possibly generate several
different output files for a .proto file, and this one is just
for defs.
- we no longer generate selectors in the .h file.
- the upbdefs.c no longer vends a SymbolTable. Now it vends the
individual messages (and possibly a FileDef later). I think this
will compose better once we can generate files where one
generated files imports another.
We also make the descriptor reader vend a list of FileDefs now.
This is the best conceptual match for parsing a FileDescriptorSet.
A large part of this change contains surface-level
porting, like moving variable declarations to the
top of the block.
However there are a few more substantial things too:
- moved internal-only struct definitions to a separate
file (structdefs.int.h), for greater encapsulation
and ABI compatibility.
- removed the UPB_UPCAST macro, since it requires access
to the internal-only struct definitions. Replaced uses
with calls to inline, type-safe casting functions.
- removed the UPB_DEFINE_CLASS/UPB_DEFINE_STRUCT macros.
Class and struct definitions are now more explicit -- you
get to see the actual class/struct keywords in the source.
The casting convenience functions have been moved into
UPB_DECLARE_DERIVED_TYPE() and UPB_DECLARE_DERIVED_TYPE2().
- the new way that we duplicate base methods in derived types
is also more convenient and requires less duplication.
It is also less greppable, but hopefully that is not
too big a problem.
Compiler flags (-std=c89 -pedantic) should help to rigorously
enforce that the code is free of C99-isms.
A few functions are not available in C89 (strtoll). There
are temporary, hacky solutions in place.