This CL deletes the per-message C++ functions for operating on repeated fields
and replaces them with functions in the runtime that can work with arbitrary
messages.
Similar to what we did with maps, this required refactoring the code to make it
work with `RepeatedPtrFieldBase`, the untyped base class of
`RepeatedPtrField<T>`. I added a `RustRepeatedMessageHelper` class to allow us
access to the protected methods we need.
This should save a bit of linker input code size, but I think more importantly
we are going to need this eventually to enable tree shaking to work well.
PiperOrigin-RevId: 693394959
This CL migrates messages, enums, and primitive types all onto the same blanket
implementation of the `ProxiedInMapValue` trait. This gets us to the point
where messages and enums no longer need to generate any significant amount of
extra code just in case they might be used as a map value.
There are a few big pieces to this:
- I generalized the message-specific FFI endpoints in `rust/cpp_kernel/map.cc`
to be able to additionally handle enums and primitive types as values. This
mostly consisted of replacing `MessageLite*` parameters with a new `MapValue`
tagged union.
- On the Rust side, I added a new blanket implementation of
`ProxiedInMapValue` in rust/cpp.rs. It relies on its value type to implement
a new `CppMapTypeConversions` trait so that it can convert to and from the
`MapValue` tagged union used for FFI.
- In the Rust generated code, I deleted the generated `ProxiedInMapValue`
implementations for messages and enums and replaced them with
implementations of the `CppMapTypeConversions` trait.
PiperOrigin-RevId: 687355817
We generate these constants to enable map operations, but this is no longer
necessary now that we can get the relevant size and alignment information for
each message through its vtable.
PiperOrigin-RevId: 680712939
We have been relying on a per-message generated `placement_new` function for
implementing map insertion, but this CL simplifies things by removing that.
Instead, we do a reflective swap if possible, or else fall back on a copy.
This will probably make insertions a bit slower, but I think it may be worth it
because it should make it much simpler to have a blanket implementation for
ProxedInMapValue that works for all map types.
It looks like it should be possible to make this faster in the future by
implementing a bitwise move that will work for any message.
PiperOrigin-RevId: 676495920
`UntypedMapIterator::next_unchecked` currently expects to be passed a pointer
to a C++ function that it can use to dereference an iterator. However, this is
awkward because it's not natural for this C++ function to have the same
signature for every map type. Maps with a message as value need a
`MapNodeSizeInfo`, but other map types do not. We are working around this by
sometimes passing an ignored placeholder constant, but this is messy.
This CL replaces the `extern "C"` functions with closures. This way, we can
capture the `MapNodeSizeInfo` in the closure in cases where we need it, but
otherwise we no longer need to pass around placeholder values.
(Note: `MapNodeSizeInfo` is going away soon, but this is still relevant because
it will likely need to be replaced by a message default instance pointer.)
PiperOrigin-RevId: 676438640
This version works where both sides are the same message, messageview or messagemut type, but not any mix of them (e.g. cannot compare a message against its corresponding view).
PiperOrigin-RevId: 671091099
These types are effectively two ways to spell the same type, but MapView/MapMut is more terse, especially where the unnamed lifetime was used which can now be implied (`View<'_, Map<K, V>>` can just be `MapView<K,V>`)
PiperOrigin-RevId: 667636945
We were really inconsistent on where we put Private or not and this tries to make a sensible consistent state of:
- For types that are exposed to application code, any pub methods which are only pub so they can be used by gencode (which is mostly anything that has any internal/runtime type anywhere on the parameters or return type list), have those methods have both a `Private` arg and doc(hidden)
- For structs that are only inside __runtime / __internal to begin with, put doc(hidden) on the types, and don't put Private on any of their methods since callers can't reach those types regardless.
Note that for exposed functions which also _accept_ another internal/runtime type in a parameter, the additional `Private` arg is superfluous since application code shouldn't ever be able to reach one of those internal types to be able to pass one in, but this keeps the pattern of keeping Private on it in those cases as well (the `Private` would still be the only guard on methods which only _return_ an internal type).
PiperOrigin-RevId: 667547566
Protobuf enums in C++ always have `int` as their backing type, so for the
purpose of use as a map value, we can just treat enums as 32-bit ints. This
allows us to delete the enum-specific generated C++ thunks.
PiperOrigin-RevId: 666043376
This change adds delete, clear, serialize, parse, copy_from, and merge_from
operations to the runtime. Since these operations can all be implemented easily
on the `MessageLite` interface, we can use a common implementation in the
runtime instead of generating per-message thunks for all of these.
I suspect this will also make it possible to remove some of our generated trait
implementations and replace them with blanket implementations, but I will leave
that for a future change.
PiperOrigin-RevId: 665910927
With the C++ kernel for Rust, we currently need to generate quite a few C++
thunks for operations on map fields. For each message we generate, we generate
these thunks for all possible map types that could have that message as a
value. These operations are for things such as insertion, removal, clearing,
iterating, etc.
The reason we do this is that templated types don't play well with FFI, so we
effectively need separate FFI endpoints for every possible combination of key
and value types used (or even potentially used) as a map field.
This CL fixes the problem by replacing the generated thunks with functions in
the runtime that can operate on `proto2::MessageLite*` without needing to care
about the specific message type.
The way it works is that we implement the operations using either
`UntypedMapBase` (the base class of all map types, which knows nothing about
the key and value types) or `KeyMapBase`, which knows the key type but not the
value type. I roughly followed the example of the table-driven parser, which
has a similar problem of needing to operate generically on maps without having
access to the concrete types.
I removed 54 thunks per message (that's 6 key types times 9 operations per
key), but had to add two new thunks per message:
- The `size_info` thunk looks up the `MapNodeSizeInfoT`, which is stored in a
small constant table. The important thing here is an offset indicating where
to look for the value in each map entry. This offset can be different for
every pair of key and value types, but we can safely assume that the result
does not depend on the signedness of the key. As a result we only need to
store four entries per message: one each for i32, i64, bool, and string.
- The `placement_new` thunk move-constructs a message in place. We need this
to be able to efficiently implement map insertion.
There are two big things that this CL does not address yet but which I plan to
follow up on:
- Enums still generate many map-related C++ thunks that could be replaced with
a common implementation. This should actually be much easier to handle than
messages, because every enum has the same representation as an i32.
- We still generate six `ProxiedInMapValue` implementations for every message,
but it should be possible to replace these with a blanket implementation that
works for all message types.
PiperOrigin-RevId: 657681421
- We introduce two new view types ProtoStringCow and ProtoBytesCow.
- In UPB, for cord field accessors we always return a Cow::Borrowed.
- In C++, for coed field accessors we check if the underlying absl::Cord is flat (contigous) and if so return a Cow::Borrowed. If it's not flat we copy the data to a ProtoString and return a Cow::Owned.
- We expect the absl::Cord to be flat almost all the time. We have experimentally verified that for small strings (<4 KiB) and less than 6 appends the cord is in fact flat [1].
- This change lifts the requirement of all ViewProxy types to be Copy. Our Cow types cannot be Copy because the owned types aren't copy.
[1] https://source.corp.google.com/piper///depot/google3/experimental/users/buchgr/cords/cords.cc
PiperOrigin-RevId: 655485943
When you have a `T: SomeTrait + ?Sized` and `trait SomeTrait:Sized`, the ?Sized has no effect (its not able to "remove" the requirement).
PiperOrigin-RevId: 654836513
Calling into_proxied() already does a copy and before this change we were doing a second one.
I am not using set_allocated_<field(std::string* s) because the method is not generated when [features.(pb.cpp).string_type = VIEW] is specified.
PiperOrigin-RevId: 650612909
* The public Repeated::{push, set} and Map::insert methods now accept any value that implements IntoProxied<T>, allowing us to move owned values instead of copying them.
* This change also updates the FFI layer for strings/bytes in the repeated and maps thunks to accept a std::string* that can be moved rather than a PtrAndLen type that needs to be copied.
* Tests are updated to no longer .as_view() when setting a message / string on a repeated / map field. The IntoProxied trait makes calling .as_view() obsolete.
PiperOrigin-RevId: 650580788
This will mean that calling DebugString on a MessageLite* which is actually a full Message will get the debug info instead of the minimal output.
PiperOrigin-RevId: 649103508
Besides unnecessary inconsistency on our C symbols, double underscores anywhere in the name are reserved for stdlib use. In practice its unlikely these symbols would ever hit a collision problem (maybe the prior name 'utf8_debug_string' with no prefix as having some risk), but safer to just standardize on this and have no concerns going forward.
PiperOrigin-RevId: 648709299
`Vec<u8>` is a more idiomatic Rust type to return for serialization.
For the C++ kernel, we are able to return this type with no extra copying. We
still use `SerializedData` type for FFI, but convert the result into a
`Vec<u8>` using a new `into_vec` method.
The upb kernel serializes onto an arena, so for upb we do need to copy the data
to get it into a `Vec<u8>`.
PiperOrigin-RevId: 644444571
This change adds a cfg attribute 'cpp_lite' to the C++ kernel of Protobuf Rust. If C++ lite is selected on the command line, the cfg attribute 'cpp_lite' is set. The root cause of the test failure was that the Debug implementation for full C++ protos uses text proto which is not available in C++ lite. The fix uses the 'cpp_lite' cfg attribute to select a different Debug implementation that doesn't rely on text proto
PiperOrigin-RevId: 640552701
We modify set_<repeated_field> to accept the IntoProxied type as the value and move the value (avoid copying) whenever possible.
For UPB:
- We fuse the arena of Repeated<T> with the parent message arena.
- We use upb_Message_SetBaseField to set the upb_Array contained in the Repeated<T>.
For C++:
- We generate an additional setter thunk that moves the value.
- The move assignment operator of RepeatedField/RepeatedPtrField is specialized. In order to adhere to the layering check we need to add '#include' statements for all .proto imports to the generated thunks.pb.cc.
PiperOrigin-RevId: 631010333
The last callside that used PrimitiveMut was in our enums code. This change makes it so that enums nolonger implement MutProxied and thus no longer need the PrimitiveMut type.
PiperOrigin-RevId: 629017282
This change removes the only remaining instance of SettableValue in a generated accessor. The principled fix is to implement IntoProxied for ProtoStr/[u8], but this will have to wait until we have agreed on & implemented the 1.0 string types. So we'll use AsRef in the meantime in order to not break any user code while allowing us to make progress on IntoProxied and Proxied changes.
PiperOrigin-RevId: 627735404
The intent of this directory would be for a layer of Rust bindings that directly map to upb semantics; Rust Protobuf runtime would be layer on top of that Rust, instead of directly on the upb C api.
PiperOrigin-RevId: 624282429
This change implements a custom Debug for messages, views and muts in the C++ kernel. Debug defers to proto2::Utf8Format.
It implements this only for the C++ kernel. We will need to pull in additional dependencies beyond minitables to implement it for UPB as well. This will be done at a later point.
PiperOrigin-RevId: 613191236
It now uses the same prefix as other thunks needed for the proxied type,
so the RawMapThunk helper can be used for enums.
Calling it a "iter next" thunk is misleading.
It does not increment the iterator as "next" implies,
it only gets the current key/value the iterator points to.
PiperOrigin-RevId: 609527442
A public "raw" field in a safe wrapper is guaranteed unsound!
This makes repeated and map inner access consistent and
avoids exposing raw internals.
It also provides the accessors necessary for implementing map
access for external types.
PiperOrigin-RevId: 604405543
Part 3 of 4 (added a stage).
The getter and mut_getter bifurcate based on the kernel.
upb returns Option<RawMessage>, while cpp returns the RawMessage.
This means that we can't have a unified MessageVTable in vtable.rs, so we've split these out in {upb.rs and cpp.rs}.
$field$_entry is now prepped and populated for the $field$_mut swappage in the following CL.
PiperOrigin-RevId: 601230880