The arena and the heap representation are both aligned to 8 bytes so we can use the lowest 3 bits of the heap pointer to encode whether the RepeatedField is in SOO mode or not, and if it is in SOO mode, to record the size. Therefore, we can have SOO sizes from 0 to 3 and one bit for whether we're in SOO mode. Note that we also tried using 3 bits for SOO size with a sentinel value for not-SOO-mode, but that was slower.
PiperOrigin-RevId: 659578775
Adds new tests in `//compatibility` for detecting undesired breaking changes in the schema of the well-known types and `descriptor.proto` files, using the Buf breaking change detector, via [`rules_buf`](https://github.com/bufbuild/rules_buf/).
In order to keep things light-touch as far as maintenance goes, I have chosen to keep the integration as small and simple as possible. Some notes:
- Breaking change behavior can be granularly controlled via [`buf.yaml`](https://buf.build/docs/configuration/v1/buf-yaml#breaking).
- Bazel sandboxes us away from meaningful VCS information, so in order to pick a target to check for breaking changes against, a new variable is added to `protobuf_versions.bzl` that needs to be updated with changes in the release version.
- Breaking change detection is performed on a file-level, not a package-level, so migrating types across WKT files would constitute a breaking change. If this is not desired the behavior can be made to work on a package-level, though we need to do some more work as `buf_breaking_test` currently only accepts a single file descriptor set target for the `against` attribute.
Closes#17513
COPYBARA_INTEGRATE_REVIEW=https://github.com/protocolbuffers/protobuf/pull/17513 from jchadwick-buf:buf-breaking cd46bb2c1e
PiperOrigin-RevId: 658624708
Store them as Object[], because you can't have a Entry[], because Entry[] has a generic 'this' over <K, V>, you get error "generic array creation" if you try to make "new Entry[]".
And if you try to cast Object[] to Entry[], you get ClassCastException. So I've just made it an object array that we cast when reading out of.
This should save memory and improve performance, because we have fewer pointers to chase.
Also fixed some warnings about unnecessary unchecked suppressions, and type-name should end with T.
PiperOrigin-RevId: 658585983
By "string APIs", I mean the following APIs:
- mutable_foo
- set_allocated_foo
- release_foo
See also: https://protobuf.dev/reference/cpp/cpp-generated/#implicit-string
Add the following test cases (for singular implicit-presence fields):
- A single call to mutable_foo() should not result in additional serialization
on the wire.
- Assigning an empty string to mutable_foo() should not result in additional
serialization on the wire.
- Calling set_allocated_foo() with an empty string should not result in
additional serialization on the wire.
- If a field is nonempty, release_foo() effectively clears the field (while
returning a pointer to the original data).
Adding coverage for these behaviours can increase confidence when we introduce
internal hasbits to help with presence tracking for implicit presence fields.
mutable_foo will in general set the hasbit, so the generated code will need to
check that field is nonempty before serializing.
PiperOrigin-RevId: 658562530
Before all invocations would try to create a library called protobuf_codegen_upb_gen_code. Now it should properly use the name of the crate currently being built.
PiperOrigin-RevId: 658414835
This should save a little memory. We've observed hundreds of such empty arrays
in some applications.
This affects non-empty messages with none of:
- repeated fields
- map fields
- fields to check if they're initialized
PiperOrigin-RevId: 658137927
This allows us to test our maven setup without relying on static `pom.xml` files and clears the way to remove them in a subsequent CL.
PiperOrigin-RevId: 658057911
This gist of the new test case coverage can be summarized as:
- The wire does not distinguish between explicit and implicit fields.
- For implicit presence fields, zeros on the wire can overwrite nonzero values
(i.e. they are treated as a 'clear' operation).
It's TBD whether we want to accept this behaviour going forward.
Right now we are leaning towards keeping this behaviour, because:
- If we receive zeros on the wire for implicit-presence fields, the protobuf
wire format is "wrong" anyway. Well-behaved code should never generate zeros
on the wire for implicit presence fields or serialize the same field multiple
times.
- There might be some value to enforce that "anything on the wire is accepted".
This can make handling of wire format simpler.
There are some drawbacks with this approach:
- It might be somewhat surprising for users that zeros on the wire are always
read, even for implicit-presence fields.
- It might make the transition from implicit-presence to explicit-presence
harder (or more unsafe) if user wants to migrate.
- It leads to an inconsistency between what it means to "Merge".
- Merging from a constructed object, with implicit presence and with field
set to zero, will not overwrite.
- Merging from the wire, with implicit presence and with zero explicitly
present on the wire, WILL overwrite.
I still need to add conformance tests to ensure that this is a consistent
behavior across all languages, but for now let's at least add some coverage in
C++ to ensure that this is a tested behaviour.
PiperOrigin-RevId: 657724599
With the C++ kernel for Rust, we currently need to generate quite a few C++
thunks for operations on map fields. For each message we generate, we generate
these thunks for all possible map types that could have that message as a
value. These operations are for things such as insertion, removal, clearing,
iterating, etc.
The reason we do this is that templated types don't play well with FFI, so we
effectively need separate FFI endpoints for every possible combination of key
and value types used (or even potentially used) as a map field.
This CL fixes the problem by replacing the generated thunks with functions in
the runtime that can operate on `proto2::MessageLite*` without needing to care
about the specific message type.
The way it works is that we implement the operations using either
`UntypedMapBase` (the base class of all map types, which knows nothing about
the key and value types) or `KeyMapBase`, which knows the key type but not the
value type. I roughly followed the example of the table-driven parser, which
has a similar problem of needing to operate generically on maps without having
access to the concrete types.
I removed 54 thunks per message (that's 6 key types times 9 operations per
key), but had to add two new thunks per message:
- The `size_info` thunk looks up the `MapNodeSizeInfoT`, which is stored in a
small constant table. The important thing here is an offset indicating where
to look for the value in each map entry. This offset can be different for
every pair of key and value types, but we can safely assume that the result
does not depend on the signedness of the key. As a result we only need to
store four entries per message: one each for i32, i64, bool, and string.
- The `placement_new` thunk move-constructs a message in place. We need this
to be able to efficiently implement map insertion.
There are two big things that this CL does not address yet but which I plan to
follow up on:
- Enums still generate many map-related C++ thunks that could be replaced with
a common implementation. This should actually be much easier to handle than
messages, because every enum has the same representation as an i32.
- We still generate six `ProxiedInMapValue` implementations for every message,
but it should be possible to replace these with a blanket implementation that
works for all message types.
PiperOrigin-RevId: 657681421
The owned and mut interop traits have the corresponding to/from behaviors on cpp but are defined as empty on upb, while the view interop is implemented for both.
PiperOrigin-RevId: 657617187
The purpose of this trait is that it is declared as a supertrait of the traits that we don't want application code to implement (like "Proxied" and "MessageView"); application code can see those traits but not the supertrait, which enables them to use them but not implement them.
PiperOrigin-RevId: 657555025
I also added a blanket implementation of `IntoProxied<T> for T` so that we
don't have to duplicate this no-op implementation for all our types.
PiperOrigin-RevId: 656465755
Distinct from any clear_submsg(), this clears the message contents and doesn't affect presence on parent (and also allows for clearing owned messages which don't exist as a field to be cleared).
PiperOrigin-RevId: 656453234