With the C++ kernel for Rust, we currently need to generate quite a few C++
thunks for operations on map fields. For each message we generate, we generate
these thunks for all possible map types that could have that message as a
value. These operations are for things such as insertion, removal, clearing,
iterating, etc.
The reason we do this is that templated types don't play well with FFI, so we
effectively need separate FFI endpoints for every possible combination of key
and value types used (or even potentially used) as a map field.
This CL fixes the problem by replacing the generated thunks with functions in
the runtime that can operate on `proto2::MessageLite*` without needing to care
about the specific message type.
The way it works is that we implement the operations using either
`UntypedMapBase` (the base class of all map types, which knows nothing about
the key and value types) or `KeyMapBase`, which knows the key type but not the
value type. I roughly followed the example of the table-driven parser, which
has a similar problem of needing to operate generically on maps without having
access to the concrete types.
I removed 54 thunks per message (that's 6 key types times 9 operations per
key), but had to add two new thunks per message:
- The `size_info` thunk looks up the `MapNodeSizeInfoT`, which is stored in a
small constant table. The important thing here is an offset indicating where
to look for the value in each map entry. This offset can be different for
every pair of key and value types, but we can safely assume that the result
does not depend on the signedness of the key. As a result we only need to
store four entries per message: one each for i32, i64, bool, and string.
- The `placement_new` thunk move-constructs a message in place. We need this
to be able to efficiently implement map insertion.
There are two big things that this CL does not address yet but which I plan to
follow up on:
- Enums still generate many map-related C++ thunks that could be replaced with
a common implementation. This should actually be much easier to handle than
messages, because every enum has the same representation as an i32.
- We still generate six `ProxiedInMapValue` implementations for every message,
but it should be possible to replace these with a blanket implementation that
works for all message types.
PiperOrigin-RevId: 657681421
I also added a blanket implementation of `IntoProxied<T> for T` so that we
don't have to duplicate this no-op implementation for all our types.
PiperOrigin-RevId: 656465755
Distinct from any clear_submsg(), this clears the message contents and doesn't affect presence on parent (and also allows for clearing owned messages which don't exist as a field to be cleared).
PiperOrigin-RevId: 656453234
- We introduce two new view types ProtoStringCow and ProtoBytesCow.
- In UPB, for cord field accessors we always return a Cow::Borrowed.
- In C++, for coed field accessors we check if the underlying absl::Cord is flat (contigous) and if so return a Cow::Borrowed. If it's not flat we copy the data to a ProtoString and return a Cow::Owned.
- We expect the absl::Cord to be flat almost all the time. We have experimentally verified that for small strings (<4 KiB) and less than 6 appends the cord is in fact flat [1].
- This change lifts the requirement of all ViewProxy types to be Copy. Our Cow types cannot be Copy because the owned types aren't copy.
[1] https://source.corp.google.com/piper///depot/google3/experimental/users/buchgr/cords/cords.cc
PiperOrigin-RevId: 655485943
Rather than two traits (MutProxy subtrait of ViewProxy), instead have three traits (MutProxy subtrait of Proxy, and ViewProxy subtrait of Proxy).
This makes things more consistent, including that (MutProxied subtraits Proxied) is now more parallel to (MutProxy subtraits Proxy).
ViewProxy is largely a marker trait here but leaves a spot for methods that should be on ViewProxies but not MutProxies if those do show up later.
PiperOrigin-RevId: 653661953
* The public Repeated::{push, set} and Map::insert methods now accept any value that implements IntoProxied<T>, allowing us to move owned values instead of copying them.
* This change also updates the FFI layer for strings/bytes in the repeated and maps thunks to accept a std::string* that can be moved rather than a PtrAndLen type that needs to be copied.
* Tests are updated to no longer .as_view() when setting a message / string on a repeated / map field. The IntoProxied trait makes calling .as_view() obsolete.
PiperOrigin-RevId: 650580788
`Vec<u8>` is a more idiomatic Rust type to return for serialization.
For the C++ kernel, we are able to return this type with no extra copying. We
still use `SerializedData` type for FFI, but convert the result into a
`Vec<u8>` using a new `into_vec` method.
The upb kernel serializes onto an arena, so for upb we do need to copy the data
to get it into a `Vec<u8>`.
PiperOrigin-RevId: 644444571
New serialization test will verify the amount of bytes it takes to serialize a message with a field of type INT32 set to different values.
PiperOrigin-RevId: 641044189
This doesn't _actually_ make the C++ Kernel path ever fail yet, but now that the API is a Result<SerializedData, SerializeError> it can be fixed as an implementation detail.
PiperOrigin-RevId: 638329136
This is equivalent to C++'s `_IsValid` functions. Since proto3 enums are open,
`is_known` seems to be a better name than the misleading `is_valid`.
C++-specific Protocol Buffers documentation already uses "known fields" and
"unknown enum values" expressions.
PiperOrigin-RevId: 630344180
The intent is to avoid codegen issues for simple cases where a message of the shape:
message M {
int32 x = 3
string set_x = 8;
}
Which would otherwise break due to the first field having a setter whose name collides with the second field's getter.
By seeing that 'set_x' matches another field with a common accessor prefixed, it will generate all accessors for that field as though it was named `set_x_8`.
This does not avoid all possible collisions, but should mitigate the vast majority of situations of an accidental collision.
PiperOrigin-RevId: 627776907
The shared tests which access `protobuf_upb` or `protobuf_cpp`
have access to more items than the `protobuf` library itself.
This is because the former don't go through the same re-exporting based
on kernel.
I fix this by creating two test-only libraries that perform the same re-exporting
as the `protobuf` library, but with the kernel explicitly set, and changing the shared
tests to reference that instead of the inner runtime library.
This is needed to reliably test macros, where item paths are relative to the invocation,
not eagerly checked at the macro source.
PiperOrigin-RevId: 624328817
Rename .deserialize(&mut self) method to .clear_and_parse() (by marking the .deserialized deprecated pointing at the new name, will clean up usages separately)
END_PUBLIC
Per discussion in the team chat, parse/serialize is the most typical terminology for protobuf impls, we don't have much local reason to diverge.
I'm proposing giving the 'better' name to the named ctor since I think that is the one that we expect people to reach for by default; it is generally cleaner than "new then deserialize" pattern since after a parse failure there's not any message still hanging around with implementation-defined contents, along with some other smaller ergonomics benefits.
In C++ (when exceptions aren't enabled) all constructors must be infallible, so it can't have it. In Rust there's no language idiom reason why we shouldn't have an associated fn that returns Result<Msg, ParseErr>.
PiperOrigin-RevId: 618823998
The non-string field setters can bypass the vtables already, the string setters still go through the vtable path here because its more important to let them take an `impl SettableValue` to be able to set either a &str or a &ProtoStr already.
PiperOrigin-RevId: 609705233
The purpose is to avoid duplicating the mapping of different types that are only relevant to the serializer but not to the exposed api (e.g. FIXED32 vs INT32)
Treat type=GROUP as rust-type=MESSAGE here which is all that is needed for us to support groups in the rust codegen.
The RustFieldType is parallel to the preexisting FieldDescriptor::CppType which _almost_ does what we need, but it treats Bytes and Strings as the same cpptype which Rust codegen doesn't.
PiperOrigin-RevId: 609416940