With the C++ kernel for Rust, we currently need to generate quite a few C++
thunks for operations on map fields. For each message we generate, we generate
these thunks for all possible map types that could have that message as a
value. These operations are for things such as insertion, removal, clearing,
iterating, etc.
The reason we do this is that templated types don't play well with FFI, so we
effectively need separate FFI endpoints for every possible combination of key
and value types used (or even potentially used) as a map field.
This CL fixes the problem by replacing the generated thunks with functions in
the runtime that can operate on `proto2::MessageLite*` without needing to care
about the specific message type.
The way it works is that we implement the operations using either
`UntypedMapBase` (the base class of all map types, which knows nothing about
the key and value types) or `KeyMapBase`, which knows the key type but not the
value type. I roughly followed the example of the table-driven parser, which
has a similar problem of needing to operate generically on maps without having
access to the concrete types.
I removed 54 thunks per message (that's 6 key types times 9 operations per
key), but had to add two new thunks per message:
- The `size_info` thunk looks up the `MapNodeSizeInfoT`, which is stored in a
small constant table. The important thing here is an offset indicating where
to look for the value in each map entry. This offset can be different for
every pair of key and value types, but we can safely assume that the result
does not depend on the signedness of the key. As a result we only need to
store four entries per message: one each for i32, i64, bool, and string.
- The `placement_new` thunk move-constructs a message in place. We need this
to be able to efficiently implement map insertion.
There are two big things that this CL does not address yet but which I plan to
follow up on:
- Enums still generate many map-related C++ thunks that could be replaced with
a common implementation. This should actually be much easier to handle than
messages, because every enum has the same representation as an i32.
- We still generate six `ProxiedInMapValue` implementations for every message,
but it should be possible to replace these with a blanket implementation that
works for all message types.
PiperOrigin-RevId: 657681421
By changing the return type to `auto`, we can handle `std::string` and other
types in a single definition without needing a separate overload.
PiperOrigin-RevId: 653272253
Our ASAN test runs have not had the heap checker enabled, so this has allowed a
few memory leaks to slip in. This CL fixes all of them so that we can turn on
the heap checker.
The first one takes place whenever we add an entry into a string-valued map
using the C++ kernel. The problem is that `InnerProtoString::into_raw()` gives
up ownership of the raw `std::string` pointer it holds, but then we never
delete that pointer. This CL fixes the problem by deleting the pointer in C++
right after we perform the map insertion. To simplify things, I created a
`MakeCleanup()` helper function that we always call in our map insertion
thunks, but it's a no-op in the cases where we don't need to free anything.
There were a couple similar memory leaks related to repeated field accessors in
the C++ kernel, and those were simple to fix just by adding the necessary
`delete` call.
Finally, there were two benign memory leaks in the upb kernel involving global
variables used for empty repeated fields and maps. It turned out that we did
not need to use `Box` at all here, so removing that simplified things and fixed
the leaks.
PiperOrigin-RevId: 652947042
* The public Repeated::{push, set} and Map::insert methods now accept any value that implements IntoProxied<T>, allowing us to move owned values instead of copying them.
* This change also updates the FFI layer for strings/bytes in the repeated and maps thunks to accept a std::string* that can be moved rather than a PtrAndLen type that needs to be copied.
* Tests are updated to no longer .as_view() when setting a message / string on a repeated / map field. The IntoProxied trait makes calling .as_view() obsolete.
PiperOrigin-RevId: 650580788
Besides unnecessary inconsistency on our C symbols, double underscores anywhere in the name are reserved for stdlib use. In practice its unlikely these symbols would ever hit a collision problem (maybe the prior name 'utf8_debug_string' with no prefix as having some risk), but safer to just standardize on this and have no concerns going forward.
PiperOrigin-RevId: 648709299
This change adds a cfg attribute 'cpp_lite' to the C++ kernel of Protobuf Rust. If C++ lite is selected on the command line, the cfg attribute 'cpp_lite' is set. The root cause of the test failure was that the Debug implementation for full C++ protos uses text proto which is not available in C++ lite. The fix uses the 'cpp_lite' cfg attribute to select a different Debug implementation that doesn't rely on text proto
PiperOrigin-RevId: 640552701
This change implements a custom Debug for messages, views and muts in the C++ kernel. Debug defers to proto2::Utf8Format.
It implements this only for the C++ kernel. We will need to pull in additional dependencies beyond minitables to implement it for UPB as well. This will be done at a later point.
PiperOrigin-RevId: 613191236
This adds `#![deny(unsafe_op_in_unsafe_fn)]` which removes the
implicit `unsafe` block that `unsafe fn` does.
It also adds many more `SAFETY` docs, corrects some incomplete
ones, and catches a null pointer returned by `upb_Arena_New`.
PiperOrigin-RevId: 549067106
In this CL I'd like to call existing C++ Protobuf API from the V0 Rust API. Since parts of the C++ API are defined inline and using (obviously) C++ name mangling, we need to create a "thunks.cc" file that:
1) Generates code for C++ API function we use from Rust
2) Exposes these functions without any name mangling (meaning using `extern "C"`)
In this CL we add Bazel logic to generate "thunks" file, compile it, and propagate its object to linking. We also add logic to protoc to generate this "thunks" file.
The protoc logic is rather rudimentary still. I hope to focus on protoc code quality in my followup work on V0 Rust API using C++ kernel.
PiperOrigin-RevId: 523479839
After this change, the Rust codegen can be enabled via the protoc flag '--rust_out'. Due to its experimental nature we require a magic value to be provided '--rust_out=experimental-codegen=enabled:<out>'.
Make the 'RustGenerator' create an empty *.pb.rs file.
PiperOrigin-RevId: 512644570
This code is experimental and should not be expected to emit working code, and callers are liable to break without warning.
It is being released now so that development can occur in the open, but users should not expect this to be supported any time soon.
PiperOrigin-RevId: 508095929
* WIP.
* WIP.
* Builds and runs. Tests need to be updated to test presence.
* Ruby: proto3 presence is passing all tests.
* Fixed a bug where empty messages has the wrong oneof count.
* Some fixes to make the code work in google3.
* Removed plugin.h.
* Some more fixes to be namespace-independent.
* More fixes for namespace independence.
* A few final fixes.
* Another fix (hide ToUpper from Copybara).
* Fix for charp_unittest.
This adds a Ruby extension in ruby/ that is based on the 'upb' library
(now included as a submodule), and adds support for Ruby code generation
to the protoc compiler.
General
* License changed from Apache 2.0 to New BSD.
* It is now possible to define custom "options", which are basically
annotations which may be placed on definitions in a .proto file.
For example, you might define a field option called "foo" like so:
import "google/protobuf/descriptor.proto"
extend google.protobuf.FieldOptions {
optional string foo = 12345;
}
Then you annotate a field using the "foo" option:
message MyMessage {
optional int32 some_field = 1 [(foo) = "bar"]
}
The value of this option is then visible via the message's
Descriptor:
const FieldDescriptor* field =
MyMessage::descriptor()->FindFieldByName("some_field");
assert(field->options().GetExtension(foo) == "bar");
This feature has been implemented and tested in C++ and Java.
Other languages may or may not need to do extra work to support
custom options, depending on how they construct descriptors.
C++
* Fixed some GCC warnings that only occur when using -pedantic.
* Improved static initialization code, making ordering more
predictable among other things.
* TextFormat will no longer accept messages which contain multiple
instances of a singular field. Previously, the latter instance
would overwrite the former.
* Now works on systems that don't have hash_map.
Python
* Strings now use the "unicode" type rather than the "str" type.
String fields may still be assigned ASCII "str" values; they will
automatically be converted.
* Adding a property to an object representing a repeated field now
raises an exception. For example:
# No longer works (and never should have).
message.some_repeated_field.foo = 1