Users should use the add-on unknown_fields.py support.
Old usage example:
unknown_field_set = msg.UnknownFields()
New usage should be:
from google.protobuf import unknown_fields
unknown_field_set = unknown_fields.UnknownFieldSet(msg)
PiperOrigin-RevId: 589969095
This has been replaced by edition and feature getters. Any code depending on syntax will likely be broken by editions files, and should be migrated to the finer-grained feature helpers.
PiperOrigin-RevId: 589866745
As a prerequisite of enabling go/protobuf-weak-speed-messages downcasts using built-in cast operations need to be changed to cast functions recommended by protobuf team (go/protobuf-downcast-recommendation)
**More information**: go/protobuf-stripping-lsc
#codehealth
PiperOrigin-RevId: 582977861
This also fixes a few minor bugs in the editions implementation that were caught in python/conformance tests, and adds a new SetFeatureSetDefaults API to the def pool for consistency with C++ and other python implementations.
PiperOrigin-RevId: 581384108
This was already fully implemented in C++, but we need to expose features methods to the Python runtime to fully enable it. This also enables conformance and unit-testing for C++.
PiperOrigin-RevId: 581364355
This change only covers pure python, and follow-up changes will handle C++/upb variants and actually enable editions support. The C++ one works (as evident from the conformance tests), but needs some APIs added to allow for testing.
PiperOrigin-RevId: 580304039
Previously we were allocating memory on the message's arena every time we performed a `map[key]` or `map.get(key)` operation. This is unnecessary, as the key's data is only needed ephemerally, for the duration of the lookup, and we can therefore alias the Python object's string data instead of copying it.
This required fixing a bug in the convert.c operation. Previously in the `arena==NULL` case, if the user passes a bytes object instead of a unicode string, the code would return a pointer to a temporary Python object that had already been freed, leading to use-after-free. I fixed this by referencing the bytes object's data directly, and using utf8_range to verify the UTF-8.
Fixes: https://github.com/protocolbuffers/protobuf/issues/14571
PiperOrigin-RevId: 578563555
Previously we were allocating memory on the message's arena every time we performed a `map[key]` or `map.get(key)` operation. This is unnecessary, as the key's data is only needed ephemerally, for the duration of the lookup, and we can therefore alias the Python object's string data instead of copying it.
This required fixing a bug in the convert.c operation. Previously in the `arena==NULL` case, if the user passes a bytes object instead of a unicode string, the code would return a pointer to a temporary Python object that had already been freed, leading to use-after-free. I fixed this by referencing the bytes object's data directly, and using utf8_range to verify the UTF-8.
Fixes: https://github.com/protocolbuffers/protobuf/issues/14571
PiperOrigin-RevId: 578563555
Because pure python builds all descriptors at runtime via reflection, it's unable to parse options during the build of descriptor.proto (i.e. before we've built the options schemas). We always lazily parse these options to avoid this, but that still means options can't be *used* during this build. Since the current build process makes heavy use of features (which previously just relied on syntax), this poses a problem for editions.
To get around this, we just embed the resolved features directly into the gencode for this one file. This will allow us to skip feature resolution for these descriptors and still consider features in their build.
PiperOrigin-RevId: 577495949
The attempt at a more optimized approach doesn't round-trip all values of `datetime` on all platforms because `datetime.fromtimestamp(tzinfo)` is limited by the range of values accepted by `time.gmtime`, which can be substantially narrower than `datetime.min` to `datetime.max`. (The documentation notes that either `OverflowError` or `OSError` can be raised in that case, and that often this is limited to 1970 through 2038, versus 1 to 9999. See also https://github.com/python/cpython/issues/110042, the use of `gmtime` here seems unnecessary when the tzinfo supports the entire range.) So, supporting that whole range would require need fallback logic that uses this general approach anyways, which then requires a redundant set of tests for error behavior that amounts to a reimplementation of the whole function.
In addition, `datetime.fromtimestamp` doesn't support the full precision of `datetime` (https://github.com/python/cpython/issues/109849), which required adding additional code and an additional assumption (that neither tz offsets were sub-second nor tz changes mid-second).
Added test-cases for `datetime.min` in addition to the ones for `datetime.max`. Adjusted the examples and variable names slightly.
PiperOrigin-RevId: 569259168
The previous code did not work because timedelta arithmetic does not do timezone conversions. The result of adding a timedelta to a datetime has the same fixed UTC offset as the original datetime. This resulted in the correct timezone not being applied by `Timestamp.ToDatetime(tz)` whenever the UTC offset for the timezone at the represented moment was not the same as the UTC offset for that timezone at the epoch.
Instead, construct the datetime directly from the seconds part of the timestamp. It would be nice to include the nanoseconds as well (truncated to datetime's millisecond precision, but without unnecessary loss of precision). However, that doesn't work, since there isn't a way to construct a datetime from an integer-milliseconds timestamp, just float-seconds, which doesn't allow some datetime values to round-trip correctly. (This does assume that `tzinfo.utcoffset(dt).microseconds` is always zero, and that the value returned by `tzinfo.utcoffset(dt)` doesn't change mid-second. Neither of these is necessarily the case (see https://github.com/python/cpython/issues/49538), though I'd hope they hold in practice.)
This does take some care to still handle non-standard Timestamps where nanos is more than 1,000,000,000 (i.e. more than a second), since previously ToDatetime handled that, as do the other To* conversion methods.
The bug doesn't manifest for UTC (or any fixed-offset timezone), so it can be worked around in a way that will be correct before and after the fix by replacing `ts.ToDatetime(tz)` with `ts.ToDatetime(datetime.timezone.utc).astimezone(tz)`.
PiperOrigin-RevId: 569012931
This change moves almost everything in the `upb/` directory up one level, so
that for example `upb/upb/generated_code_support.h` becomes just
`upb/generated_code_support.h`. The only exceptions I made to this were that I
left `upb/cmake` and `upb/BUILD` where they are, mostly because that avoids
conflict with other files and the current locations seem reasonable for now.
The `python/` directory is a little bit of a challenge because we had to merge
the existing directory there with `upb/python/`. I made `upb/python/BUILD` into
the BUILD file for the merged directory, and it effectively loads the contents
of the other BUILD file via `python/build_targets.bzl`, but I plan to clean
this up soon.
PiperOrigin-RevId: 568651768
This restores the Python wheel CI runs from the old upb repo with only minor
changes. I had to update a path in one of the `py_wheel` rules and also make a
slight tweak to ensure that the `descriptor.upb_minitable.{h,c}` files make it
into the source wheels. The change in text_format_test.py is not strictly
necessary but is a small simplification I made while I was trying to debug an
issue with CRLF newlines.
I had to update test_util.py to use `importlib` to access the golden files from
the installed `protobuftests` package. I suspect the previous incarnation of
thse test runs was somehow reading the goldens from the repo checkout, but I
think the intention is to read them from `protobuftests` instead. This was a
bit tricky to get working because Python versions before 3.9 do not support
`importlib.resources.files()`. I set up the code to fall back on
`importlib.resources.open_binary()` in that case, but that function does not
support subdirectories, so this required putting an `__init__.py` file inside
the `testdata` directory to make sure it is treated as a Python package.
PiperOrigin-RevId: 567366695
A couple weeks ago we moved upb into the protobuf Git repo, and this change
continues the merger of the two repos by making them into a single Bazel repo.
This was mostly a matter of deleting upb's WORKSPACE file and fixing up a bunch
of references to reflect the new structure.
Most of the changes are pretty mechanical, but one thing that needed more
invasive changes was the Python script for generating CMakeLists.txt,
make_cmakelists.py. The WORKSPACE file it relied on no longer exists with this
change, so I updated it to hardcode the information it needed from that file.
PiperOrigin-RevId: 564810016
The Python comparison protocol requires that if an object doesn't know how to
compare itself to an object of a different type, it returns NotImplemented
rather than False. The interpreter will then try performing the comparison using
the other operand. This translates, for protos, to:
If a proto message doesn't know how to compare itself to an object of
non-message type, it returns NotImplemented. This way, the interpreter will then
try performing the comparison using the comparison methods of the other object,
which may know how to compare itself to a message. If not, then Python will
return the combined result (e.g., if both objects don't know how to perform
__eq__, then the equality operator `==` return false).
This change allows one to compare a proto with custom matchers such as mock.ANY
that the message doesn't know how to compare to, regardless of whether
mock.ANY is on the right-hand side or left-hand side of the equality (prior to
this change, it only worked with mock.ANY on the left-hand side).
Fixes https://github.com/protocolbuffers/protobuf/issues/9173
PiperOrigin-RevId: 561728156
GetOptions on fields (which parse the _serialized_options) will be called for the first time of parse or serialize instead of Build time.
Note: GetOptions on messages are still called in Build time because of message_set_wire_format. If message options are needed in descriptor.proto, a parse error will be raised in GetOptions(). We can check the file to not invoke GetOptions() for descriptor.proto as long as message_set_wire_format not needed in descriptor.proto.
Other options except message options do not invoke GetOptions() in Build time
PiperOrigin-RevId: 560741182