As an optimization, string fields are initialized with a pointer to a global
immutable std::string instance and create a local std::string only when
"set". If a field has hasbits, it presents a possibility that the hasbit is set
but the string field is still pointing to the global empty string instance.
This can happen, for example, when the field is implicit-presence but hasbit
has been generated for it.
Maintaining an invariant that hasbit is set iff string is nondefault can
simplify the implementation of destructors and message.Clear(). The code would
not need to branch further after scanning hasbits, instead it can always assume
that a local std::string object exists as soon as it sees that the hasbit is
set.
However, this does require an else block in the merge implementation of
implicit-presence string fields. When hasbits are implemented for
implicit-presence string fields, merging from a non-present (i.e. empty) string
field requires a nondefault std::string instance to be created. On the other
hand, branches in Clear() can be eliminated. We think this is the right
tradeoff because:
1. The allocation of nondefault string instance can only happen when the source
proto has hasbit set but the field is empty. This is a relatively rare
scenario.
2. Clear() is called every time a protobuf object is "overwritten" via an
assignment operator or ParseFrom(). This happens probably more frequently than 1.
PiperOrigin-RevId: 691951661
N.B.:
- This change is not intended to affect any well-defined protobuf behaviour in
an observable way.
- The wire parsing codepath is not affected.
- This change only affects the C++ protobuf implementation (other languages are
not affected).
- sizeof proto3 message objects may increase in 32-bit increments to
accommodate hasbits.
- When profiled on some of Google's largest binaries, we have seen a code size
increase of ~0.1%, which we consider to be a reasonable increase.
There are quite a few terminologies in the title:
- **singular**: a field that is not repeated, not oneof, not extension, not lazy,
just a field with a simple primitive type (number or boolean), or
string/bytes.
- **proto3**: describes behaviour consistent to the "proto3" syntax.
This is equivalent to `edition = "2023"` with
`option features.field_presence = IMPLICIT;`.
- **implicit presence**: describes behaviour consistent with "non-optional"
fields in proto3. This is described in more detail in
https://protobuf.dev/programming-guides/field_presence/#presence-in-proto3-apis
This change enables C++ proto3 objects to generate hasbits for regular proto3
(i.e. non-`optional`) fields. This code change might make certain codepaths
negligibly more efficient, but large improvement or regression is unlikely. A
larger performance improvement is expected from generating hasbits for repeated
fields -- this change will pave the way for future work there.
Hasbits in C++ will have slightly different semantics for implicit presence
fields. In the past, all hasbits are true field presence indicators. If the
hasbit is set, the field is guaranteed to be present; if the hasbit is unset,
the field is guaranteed to be missing.
This change introduces a new hasbit mode that I will call "hint hasbits",
denoted by a newly-introduced enum, `internal::cpp::HasbitMode::kHintHasbit`.
For implicit presence fields, it may be possible to mutate the field and have
it end up as a zero field, especially with `mutable_foo` APIs. To handle those
cases correctly, we unconditionally set the hasbit when `mutable_foo` is
called, then we must do an additional check for field emptiness before
serializing the field onto the wire.
PiperOrigin-RevId: 691945237
Message fields can never have implicit presence, but we have logic in
ClearField that deallocates the message field and reassigns nullptr if the
field is a "proto3" field.
This snippet is the remnants of an old implementation of message field
reflection when proto3 was first introduced (when the initial idea is to use
open structs for everything). During implementation however, we ended up
preserving explicit presence behavior for message fields.
PiperOrigin-RevId: 691199008
Note:
This change primarily affects debug + ASAN builds using protobuf arenas.
If this change causes a crash in your debug build, it probably means that
there is a use-after-free bug in your program. This change has already been
implemented and battle-tested within Google for some time.
Oneof messages on the regular heap should not be affected because the memory
they hold are already deleted. Users will already see use-after-free errors
if they attempt to access heap-allocated oneof messages after calling Clear().
When a protobuf message is cleared, all raw pointers should be invalidated
because undefined things may happen to any of the fields pointed to by
mutable_foo() APIs. While destructors may not necessarily be invoked, Clear()
should be considered a pointer invalidation event.
#test-continuous
PiperOrigin-RevId: 689569669
are enabled.
This makes the function a drop-in replacement for `dynamic_cast` when the user
is expecting exceptions to be thrown.
PiperOrigin-RevId: 689419852
By just checking for upper-cased characters, rather than digits and lower-cased characters, we have to perform fewer comparisons.
This should be safe because absl::AsciiStrToLower only operates on upper-case characters.
PiperOrigin-RevId: 688453016
Clarifies that reordering `enum` fields (even without changing their IDs) will change the order of the `value` indices. (This means that a seemingly "no-op" change to reorganize enums may affect code that (incorrectly) relied on the order of `value()`.
PiperOrigin-RevId: 688297400
based one.
This reduces binary size and runtime dispatch costs.
Also, since we are changing the declaration of the type trait, take the opportunity to remove the validator from the template parameters. It can be inferred directly from the type if we add traits for the enum.
PiperOrigin-RevId: 688151072
This CL migrates messages, enums, and primitive types all onto the same blanket
implementation of the `ProxiedInMapValue` trait. This gets us to the point
where messages and enums no longer need to generate any significant amount of
extra code just in case they might be used as a map value.
There are a few big pieces to this:
- I generalized the message-specific FFI endpoints in `rust/cpp_kernel/map.cc`
to be able to additionally handle enums and primitive types as values. This
mostly consisted of replacing `MessageLite*` parameters with a new `MapValue`
tagged union.
- On the Rust side, I added a new blanket implementation of
`ProxiedInMapValue` in rust/cpp.rs. It relies on its value type to implement
a new `CppMapTypeConversions` trait so that it can convert to and from the
`MapValue` tagged union used for FFI.
- In the Rust generated code, I deleted the generated `ProxiedInMapValue`
implementations for messages and enums and replaced them with
implementations of the `CppMapTypeConversions` trait.
PiperOrigin-RevId: 687355817
When you call a map field setter, we currently make an unnecessary extra copy,
so this CL fixes that problem.
I followed the example of how we already handle this for repeated field
setters. This required adding a new move setter thunk for map fields with the
C++ kernel. Originally I tried instead to add an FFI endpoint that could swap
two `RawMap` pointers, but it turned out to be difficult to implement this in a
way that worked correctly when the two maps are not on the same arena.
PiperOrigin-RevId: 687334655
There is really no reason for the Java compiler code to call into the internal
C++ implementation of HasHasbit. In the future, the two implementations may
evolve separately and this decoupling can make it easier.
PiperOrigin-RevId: 686672397