The intent of this change is to take the best ideas from the C++ backend, such as having generator objects that can cache pre-computed state, while minimizing duplication.
Where possible, we take the approach of making the C++ and UPB kernel-specific code as similar as possible, since this reduces the number of templates we need to keep in sync.
PiperOrigin-RevId: 524903073
FieldOptions.ctype is public and CORD support is implemented for [ctype=CORD] on "bytes" fields. Include a public comment for CORD about use and supported types.
PiperOrigin-RevId: 524383969
This CL introduces two new files, names.h and context.h.
The former is intended to hold functions that generate the stringified names of things to splat into text templates. The latter holds per-invocation options, and a Context struct that makes it easy to thread extra information throughout the codegen backend.
PiperOrigin-RevId: 524366974
PEP634 introduces structural pattern matching. This works out of the box for most parts of protobuf messages, but fails for sequence matching (defined in https://peps.python.org/pep-0634/#sequence-patterns). This is caused by the underlying containers missing the newly introduced Py_TPFLAGS_SEQUENCE flag (see 069e81ab3d).
This simply adds the flag, making the following fall into the first case:
```
message = test_pb2.TestMessage(int_sequence=(1, 2, 3))
match message:
case test_pb2.TestMessage(int_sequence=(1, *rest)):
print(f"message.int_sequence is a seq starting with 1, ending in {rest}")
case _:
print(f"No case on 'int_sequence' matched! Value: {message.int_sequence}")
```
PiperOrigin-RevId: 524326722
Before this CL all messages were generated in the top-level crate module. With
this change we generate messages under the module specified by the package
declaration in the .proto file.
Dots are interpreted as submodule separator in consistency with how C++
namespaces are handled.
Note that name of the proto_library target still remains to be used as the crate name. This CL only adds crate submodules dependeing on the specified package.
PiperOrigin-RevId: 524235162
This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.
Given a .proto file like:
```proto
syntax = "proto3";
package pkg;
message TestMessage {
optional int32 i32 = 1;
optional TestMessage msg = 2;
}
```
Generated code before:
```ruby
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: protoc_explorer/main.proto
require 'google/protobuf'
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file("test.proto", :syntax => :proto3) do
add_message "pkg.TestMessage" do
proto3_optional :i32, :int32, 1
proto3_optional :msg, :message, 2, "pkg.TestMessage"
end
end
end
module Pkg
TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```
Generated code after:
```ruby
# frozen_string_literal: true
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: test.proto
require 'google/protobuf'
descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3"
begin
Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
rescue TypeError => e
# <compatibility code, see below>
end
module Pkg
TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```
This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).
## Backward Compatibility
This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.
Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.
When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): dfb71558a2/src/google/protobuf/descriptor.proto (L65-L66)
`add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`. Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.
But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:
```proto
// foo/bar.proto
syntax = "proto2";
message Bar {}
```
```proto
// foo/baz.proto
syntax = "proto2";
import "bar.proto";
message Baz {
optional Bar bar = 1;
}
```
If you invoke `protoc` like so, it will work correctly:
```
$ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto
$ RUBYLIB=. ruby baz_pb.rb
```
However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load:
```
$ protoc --ruby_out=. -I. -Ifoo foo/baz.proto
$ protoc --ruby_out=. -I. -Ifoo foo/bar.proto
$ RUBYLIB=foo ruby foo/baz_pb.rb
foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError)
from foo/baz_pb.rb:10:in `<main>'
```
The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.
The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.
Closes#12319
COPYBARA_INTEGRATE_REVIEW=https://github.com/protocolbuffers/protobuf/pull/12319 from haberman:ruby-gencode-binary 5c0e8f20b1
PiperOrigin-RevId: 524129023
This is a copy of https://github.com/protocolbuffers/protobuf/pull/12431
but with the ifdef logic reversed. The original PR could not kick off our
CI tests because it wasn't branched directly from repo main.
Thanks to Tom Gillespie for the original PR.
PiperOrigin-RevId: 524041105
Clang's unsigned integer overflow sanitizer is flagging BucketNumber as having
an overflow. This is intentional behavior, so we should mark it as such.
Rather than disabling the sanitizer, this uses __builtin_mul_overflow to
indicate the overflow is intentional.
PiperOrigin-RevId: 524018233
next available enum number in addition to mentioning the option to allow
aliasing.
For enums that are large but are not sorted numerically it can be difficult to
find the correct enum number to use.
We use "next available" as the next number larger than the number that was
duplicatively used which is not assigned a value. For sparse enums, there may
be other enums past the suggested number.
PiperOrigin-RevId: 523487238
In this CL I'd like to call existing C++ Protobuf API from the V0 Rust API. Since parts of the C++ API are defined inline and using (obviously) C++ name mangling, we need to create a "thunks.cc" file that:
1) Generates code for C++ API function we use from Rust
2) Exposes these functions without any name mangling (meaning using `extern "C"`)
In this CL we add Bazel logic to generate "thunks" file, compile it, and propagate its object to linking. We also add logic to protoc to generate this "thunks" file.
The protoc logic is rather rudimentary still. I hope to focus on protoc code quality in my followup work on V0 Rust API using C++ kernel.
PiperOrigin-RevId: 523479839
unknown and the key has the high bit on.
It was being serialized as signed, which caused it to be sign extended. The
value is equivalent after parsing as 32-bit, but it is inconsistent with other forms of
serialization.
PiperOrigin-RevId: 523428645
This can be useful when an extension field/declaration is deleted in schema to avoid data corruption.
It also adds a check for duplicate numbers in the declarations.
PiperOrigin-RevId: 523269837
This change improves deserialization performance by in two ways:
1) VarintParseSlowArm32 and VarintParseSlowArm64 return a pair for the updated buffer pointer and parsing result instead of storing the result and returning just the pointer. This allows the parsed result to be returned to higher level code in x1 rather than via the stack.
2) Remove prohibition on inlining VarintParseSlowArm32 and VarintParseSlowArm64. This prohibition was originally included to avoid bloating non-TDP code. Now that TDP is enabled the impact on code size is small.
PiperOrigin-RevId: 523142512
This turns out to be quite of a yak shave to be able to perfectly test both kernels without having to pass extra Blaze flags.
PiperOrigin-RevId: 521850709