Put the emission of braces and indentation into MayEmitIfNonDefaultCheck, this
allows for a slight beautification in the emitted code and reduces verbosity at
the call site.
Also this technically fixes the indent in some cases (specifically when
serializing some repeated fields) so that the opening braces are emitted "at
the right level".
(Previous implementations sometimes emit an extra space character.)
Compare:
{
// code ...
}
vs.
{
// code ...
}
PiperOrigin-RevId: 660236451
The arena and the heap representation are both aligned to 8 bytes so we can use the lowest 3 bits of the heap pointer to encode whether the RepeatedField is in SOO mode or not, and if it is in SOO mode, to record the size. Therefore, we can have SOO sizes from 0 to 3 and one bit for whether we're in SOO mode. Note that we also tried using 3 bits for SOO size with a sentinel value for not-SOO-mode, but that was slower.
PiperOrigin-RevId: 659578775
By "string APIs", I mean the following APIs:
- mutable_foo
- set_allocated_foo
- release_foo
See also: https://protobuf.dev/reference/cpp/cpp-generated/#implicit-string
Add the following test cases (for singular implicit-presence fields):
- A single call to mutable_foo() should not result in additional serialization
on the wire.
- Assigning an empty string to mutable_foo() should not result in additional
serialization on the wire.
- Calling set_allocated_foo() with an empty string should not result in
additional serialization on the wire.
- If a field is nonempty, release_foo() effectively clears the field (while
returning a pointer to the original data).
Adding coverage for these behaviours can increase confidence when we introduce
internal hasbits to help with presence tracking for implicit presence fields.
mutable_foo will in general set the hasbit, so the generated code will need to
check that field is nonempty before serializing.
PiperOrigin-RevId: 658562530
This gist of the new test case coverage can be summarized as:
- The wire does not distinguish between explicit and implicit fields.
- For implicit presence fields, zeros on the wire can overwrite nonzero values
(i.e. they are treated as a 'clear' operation).
It's TBD whether we want to accept this behaviour going forward.
Right now we are leaning towards keeping this behaviour, because:
- If we receive zeros on the wire for implicit-presence fields, the protobuf
wire format is "wrong" anyway. Well-behaved code should never generate zeros
on the wire for implicit presence fields or serialize the same field multiple
times.
- There might be some value to enforce that "anything on the wire is accepted".
This can make handling of wire format simpler.
There are some drawbacks with this approach:
- It might be somewhat surprising for users that zeros on the wire are always
read, even for implicit-presence fields.
- It might make the transition from implicit-presence to explicit-presence
harder (or more unsafe) if user wants to migrate.
- It leads to an inconsistency between what it means to "Merge".
- Merging from a constructed object, with implicit presence and with field
set to zero, will not overwrite.
- Merging from the wire, with implicit presence and with zero explicitly
present on the wire, WILL overwrite.
I still need to add conformance tests to ensure that this is a consistent
behavior across all languages, but for now let's at least add some coverage in
C++ to ensure that this is a tested behaviour.
PiperOrigin-RevId: 657724599
With the C++ kernel for Rust, we currently need to generate quite a few C++
thunks for operations on map fields. For each message we generate, we generate
these thunks for all possible map types that could have that message as a
value. These operations are for things such as insertion, removal, clearing,
iterating, etc.
The reason we do this is that templated types don't play well with FFI, so we
effectively need separate FFI endpoints for every possible combination of key
and value types used (or even potentially used) as a map field.
This CL fixes the problem by replacing the generated thunks with functions in
the runtime that can operate on `proto2::MessageLite*` without needing to care
about the specific message type.
The way it works is that we implement the operations using either
`UntypedMapBase` (the base class of all map types, which knows nothing about
the key and value types) or `KeyMapBase`, which knows the key type but not the
value type. I roughly followed the example of the table-driven parser, which
has a similar problem of needing to operate generically on maps without having
access to the concrete types.
I removed 54 thunks per message (that's 6 key types times 9 operations per
key), but had to add two new thunks per message:
- The `size_info` thunk looks up the `MapNodeSizeInfoT`, which is stored in a
small constant table. The important thing here is an offset indicating where
to look for the value in each map entry. This offset can be different for
every pair of key and value types, but we can safely assume that the result
does not depend on the signedness of the key. As a result we only need to
store four entries per message: one each for i32, i64, bool, and string.
- The `placement_new` thunk move-constructs a message in place. We need this
to be able to efficiently implement map insertion.
There are two big things that this CL does not address yet but which I plan to
follow up on:
- Enums still generate many map-related C++ thunks that could be replaced with
a common implementation. This should actually be much easier to handle than
messages, because every enum has the same representation as an i32.
- We still generate six `ProxiedInMapValue` implementations for every message,
but it should be possible to replace these with a blanket implementation that
works for all message types.
PiperOrigin-RevId: 657681421
The owned and mut interop traits have the corresponding to/from behaviors on cpp but are defined as empty on upb, while the view interop is implemented for both.
PiperOrigin-RevId: 657617187
The purpose of this trait is that it is declared as a supertrait of the traits that we don't want application code to implement (like "Proxied" and "MessageView"); application code can see those traits but not the supertrait, which enables them to use them but not implement them.
PiperOrigin-RevId: 657555025
I also added a blanket implementation of `IntoProxied<T> for T` so that we
don't have to duplicate this no-op implementation for all our types.
PiperOrigin-RevId: 656465755
Distinct from any clear_submsg(), this clears the message contents and doesn't affect presence on parent (and also allows for clearing owned messages which don't exist as a field to be cleared).
PiperOrigin-RevId: 656453234
- Add overloads that take `absl::Cord` and `std::string&&` as inputs, putting the burden in the implementation instead of users.
- Add overload that returns `absl::Span<char>` for callers that need higher performance requirements where we can avoid copies altogether.
- Hide the APIs that return `std::string*` when the breaking change is enabled (via `-D PROTOBUF_TEMPORARY_ENABLE_STRING_VIEW_RETURN_TYPE`).
PiperOrigin-RevId: 655600399
- We introduce two new view types ProtoStringCow and ProtoBytesCow.
- In UPB, for cord field accessors we always return a Cow::Borrowed.
- In C++, for coed field accessors we check if the underlying absl::Cord is flat (contigous) and if so return a Cow::Borrowed. If it's not flat we copy the data to a ProtoString and return a Cow::Owned.
- We expect the absl::Cord to be flat almost all the time. We have experimentally verified that for small strings (<4 KiB) and less than 6 appends the cord is in fact flat [1].
- This change lifts the requirement of all ViewProxy types to be Copy. Our Cow types cannot be Copy because the owned types aren't copy.
[1] https://source.corp.google.com/piper///depot/google3/experimental/users/buchgr/cords/cords.cc
PiperOrigin-RevId: 655485943
GenerateByteSize itself remains deeply nested, but by factoring out one part of
the loop, at least we make the part that generates UpdateByteSize a bit more
readable.
Making the callsite of MayEmitIfNonDefaultCheck less nested actually resulted
in slight readability improvements also in the generated code, namely of the
form:
@@ -10563,8 +10559,7 @@ PROTOBUF_NOINLINE void OneStringEdition:
{
// string data = 1;
- cached_has_bits =
- this_._impl_._has_bits_[0];
+ cached_has_bits = this_._impl_._has_bits_[0];
if (cached_has_bits & 0x00000001u) {
total_size += 1 + ::proto2::internal::WireFormatLite::StringSize(
this_._internal_data());
These readability improvements should be kept IMO -- they make the generated
protobuf C++ code slightly easier to read.
PiperOrigin-RevId: 655180880
ShouldEmitIfNonDefaultCheck is mostly called in MayEmitIfNonDefaultCheck,
except in one location in GenerateByteSize.
Making MayEmitIfNonDefaultCheck the only caller of ShouldEmitIfNonDefaultCheck
seems like a good application of DRY and reduces the possibility of unintended
divergence in the future. It also makes future changes to serialization logic
easier.
There should be no changes to the resulting generated code.
PiperOrigin-RevId: 654789393
This also changes MessageView and MessageMut to be subtraits of the ViewProxy and MutProxy (which also requires them to have lifetimes now, and a weird extra bounds on ViewProxy)
PiperOrigin-RevId: 654788207
- Now it is arena constructible. Allocates the array, strings, and inner sets in the arena.
- Now it is arena destructible. Will skip destructors when in an arena.
Also, optimize `default_instance()` now that we can provide a `constinit` version of it.
PiperOrigin-RevId: 654743861
Not always emitting the opening brace in this function call can give the caller
a bit more flexibility: for example, it allows for easier use of formatted
string substitutions when both the opening and the closing brace can be
accounted for by the same caller.
PiperOrigin-RevId: 654113509