Adds test coverage for invalid empty strings (e.g. ""), non-numeric strings (e.g. "abc"), and partially-numeric strings (e.g. "12abc"), as well as valid exponential numeric strings ("1e5)
We will target enforcing non-conformant cases that should have failed to parse but didn't in upb in v30.x (our ~annual breaking release in some languages). Conformance failures to accept input we previously failed on can be landed at any time.
PiperOrigin-RevId: 694269337
This didn't require any change to the algorithm, except to mark one extra (immutable) member as atomic. It did require changing the tests though.
PiperOrigin-RevId: 690751567
This ports upb's WORKSPACE scraping logic to protobuf, and allows us to dynamically fetch our dependencies at the exact same pinned version as in Bazel via protobuf_FETCH_DEPENDENCIES=ON. This is mostly for development purposes, and is preferable to git submodules. In a later cl we will flip the default behavior to "package"
#test-continuous
PiperOrigin-RevId: 686265348
In C23, `false` is now of type `bool`, not `int`. This turned a couple places where `false` was returned instead of `NULL` into errors.
The workaround in upb for Windows's broken NAN macro causes an error in Clang under C23, `cannot compile this static initializer yet`, I believe because `0.0 / 0.0` is not valid in constant-evaluation. This seems like it's probably a legitimate result of C23 standardizing constexpr, although it _could_ be a clang bug. In any case, refine the workaround a bit, to avoid this problem.
I've also reverted the kUpb_FltInfinity/kUpb_Infinity back to their former definitions, as INFINITY wasn't broken by the windows header, only NAN.
PiperOrigin-RevId: 683152452
This PR does the following:
* Special-case descriptor.proto to allow for codegen despite being proto2
* Fix a pre-existing bug that gets exercised now during descriptor builds
* Expand test coverage of conformance tests, and fix pre-existing issues
* Properly hook up gencode to staleness infrastructure for automated regen
Closes#18610
COPYBARA_INTEGRATE_REVIEW=https://github.com/protocolbuffers/protobuf/pull/18610 from protocolbuffers:php-regen 773a1bf01a
PiperOrigin-RevId: 682477438
Empty strings are invalid numeric values. upb must fail to convert JSON objects that contain empty string values for numbers, but it currently does not.
PiperOrigin-RevId: 681623866
Introduce a upb_MessageValue_Zero() function to use for the cases we do want a zero'd union (typically a zero MessageValue union is not needed)
PiperOrigin-RevId: 672592274
The goal of the `names.h` convention is to have a single canonical place where a code generator can define the set of symbols it exports to other code generators, and a canonical place where the name mangling logic is implemented.
Each upb code generator now has its own `names.h` file defining the symbols that it owns & exports:
* `third_party/upb/upb_generator/c/names.h` (for `foo.upb.h` files)
* `third_party/upb/upb_generator/minitable/names.h` (for `foo.upb_minitable.h` files)
* `third_party/upb/upb_generator/reflection/names.h` (for `foo.upbdefs.h` files)
This is a significant improvement over the previous situation where the name mangling functions were co-mingled in `common.h`/`mangle.h`, or sprinkled throughout the generators, with no clear structure for which code generator owns which symbols.
With this structure in place, the visibility lists for the various `names.h` files provide a clear dependency graph for how different generators depend on each other. In general, we want to keep dependencies on the "C" code generator to a minimum, since it is the largest and most complicated of upb's generated APIs, and is also the most prone to symbol name clashes.
Note that upb's `names.h` headers are somewhat unusual, in that we do not want them to depend on C++'s reflection or upb's reflection. Most `names.h` headers in protobuf would use types like `proto2::Descriptor`, but we don't want upb to depend on C++ reflection, especially during its bootstrapping process. We also don't want to force users to build upb defs just to use these name mangling functions. So we use only plain string types like `absl::string_view` and `std::string`.
PiperOrigin-RevId: 672397247
This also increases compliance by adding `default_applicable_licenses` to several `BUILD` files that previously did not have it.
PiperOrigin-RevId: 670784686
This CL is mostly a no-op, except that now google3-only code is actually stripped from OSS, instead of being preserved in `# begin:google_only` blocks.
This follows the conventions of the greater Copybara ecosystem.
PiperOrigin-RevId: 669513564
This simplifies upb by removing differences between google3 and OSS.
This also points upb at the protobuf license, instead of keeping a separate copy around for upb.
PiperOrigin-RevId: 669447145
This just behaves the same as the pre-existing upb_Message_SetMessage(), but with the intendended naming style (upb_Message_SetMessage should accept an arena and support extensions).
PiperOrigin-RevId: 667998428
Putting it into BUILD files unintentionally forces it on all our downstream users. Instead, we just want to enable this during testing and let them choose for themselves in their builds.
Note, that this expands the scope of -Werror to our entire repo for CI, so a bunch of fixes and opt-outs had to be applied to get this change passing.
Closed#14714
PiperOrigin-RevId: 666903224
Fixes the warning:
```
warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
extern const upb_MiniTable* google__protobuf__OneofDescriptorProto_msg_init();
^
void
```
PiperOrigin-RevId: 664971019
Our bootstrapping setup compiles multiple versions of the generated code for `descriptor.proto` and `plugin.proto`, one for each stage of the bootstrap. For source files (`.c`), we can always select the correct version of the file in the BUILD rules, but for header files we need to make sure the correct stage's file is always selected via `#include`.
Previously we used `cc_library(includes=[])` to make it appear as though our bootstrapped headers had the same names as the "real" headers. This allowed a lot of the code to be agnostic to whether a bootstrap header was being used, which simplified things because we did not have to change the code performing the `#include`.
Unfortunately, due to build system limitations, this sometimes led to the incorrect header getting included. This should not have been possible, because we had a clean BUILD graph that should have removed all ambiguity about which header should be available. But in non-sandboxed builds, the compiler was able to find headers that were not actually in `deps=[]`, and worse it preferred those headers over the headers that actually were in `deps=[]`. This led to unintended results and errors about layering check violations.
This CL fixes the problem by removing all use of `includes=[]`. We now spell a full pathname to all bootstrap headers, so this class of errors is no longer possible. Unfortunately this adds some complexity, as we have to hard-code these full paths in several places.
A nice improvement in this CL is that `bootstrap_upb_proto_library()` can now only be used for bootstrapping; it only exposes the `descriptor_bootstrap.h` / `plugin_bootstrap.h` files. Anyone wanting to use the normal `net/proto2/proto/descriptor.upb.h` file should depend on `//net/proto2/proto:descriptor_upb_c_proto` target instead.
PiperOrigin-RevId: 664953196
We were failing to propagate the DefPool's platform to the MiniDescriptor builder. This caused upb's code generators to incorrectly generate a field rep of `kUpb_FieldRep_8Byte` for pointer-typed extension fields instead of the 32-bit clean output:
```
UPB_SIZE(kUpb_FieldRep_4Byte, kUpb_FieldRep_8Byte)
```
PiperOrigin-RevId: 653263168
General test for it is done in Rust, and then extensions are tested in UPB as they're not currently supported in Rust-upb.
PiperOrigin-RevId: 651113583
This should significantly reduce the size of large arenas. Previously, a large arena would nearly double in size if the most recent block filled up. This could end up wasting large amounts of memory. After this CL, we will waste at most the max block size, which defaults to 32k.
This more or less matches the behavior of the C++ arena.
PiperOrigin-RevId: 647802280