require all memory reads to be aligned. Specifically, it turns out that
sizeof(RepeatedField<bool>) is 20 on 64-bit sparc with GCC 3.4.6. This is
strange, since one of RepeatedField's members is a pointer, which I thought
meant that it had to be 64-bit aligned, which means its size should be a
multiple of 64 bits. But, 20 is not a multiple of 8. I don't understand why
this is the case, but if this is possible, then DynamicMessage's strategy of
sorting fields in descending order by size and then tightly packing doesn't
work. To fix this, I got rid of the sort step and instead added code that
aligns each field's offset appropriately based on the field's size.
Also in this revision: Fix an error message that named a flag incorrectly.
Details:
For each message type, protoc generates an array of byte offsets of each of
the fields within the message class. These offsets are later used by the
reflection implementation. Prior to this revision, the offset arrays were
allocated as global variables. Since they were just arrays of ints, they
should have been initialized at compile time. Unfortunately, GCC 4.3.0
incorrectly decides that they cannot be initialized at compile time because
the values used to initialize the array have type ptrdiff_t, and GCC 4.3.0
does not recognize that it can convert ptrdiff_t to int at compile time. This
bug did not seem to exist in previous versions of GCC. Google's compiler
team has submitted a fix for this bug back to the GCC project, but we will
have to work around it anyway since Fedora 9 shipped with GCC 4.3.0.
General
* License changed from Apache 2.0 to New BSD.
* It is now possible to define custom "options", which are basically
annotations which may be placed on definitions in a .proto file.
For example, you might define a field option called "foo" like so:
import "google/protobuf/descriptor.proto"
extend google.protobuf.FieldOptions {
optional string foo = 12345;
}
Then you annotate a field using the "foo" option:
message MyMessage {
optional int32 some_field = 1 [(foo) = "bar"]
}
The value of this option is then visible via the message's
Descriptor:
const FieldDescriptor* field =
MyMessage::descriptor()->FindFieldByName("some_field");
assert(field->options().GetExtension(foo) == "bar");
This feature has been implemented and tested in C++ and Java.
Other languages may or may not need to do extra work to support
custom options, depending on how they construct descriptors.
C++
* Fixed some GCC warnings that only occur when using -pedantic.
* Improved static initialization code, making ordering more
predictable among other things.
* TextFormat will no longer accept messages which contain multiple
instances of a singular field. Previously, the latter instance
would overwrite the former.
* Now works on systems that don't have hash_map.
Python
* Strings now use the "unicode" type rather than the "str" type.
String fields may still be assigned ASCII "str" values; they will
automatically be converted.
* Adding a property to an object representing a repeated field now
raises an exception. For example:
# No longer works (and never should have).
message.some_repeated_field.foo = 1
Python tests run correctly even when a previous version of the library is
already installed. I was unable to reproduce his problem on my machine but
the fix seems harmless enough.
protoc
- New flags --encode and --decode can be used to convert between protobuf text
format and binary format from the command-line.
- New flag --descriptor_set_out can be used to write FileDescriptorProtos for
all parsed files directly into a single output file. This is particularly
useful if you wish to parse .proto files from programs written in languages
other than C++: just run protoc as a background process and have it output
a FileDescriptorList, then parse that natively.
C++
- Reflection objects are now per-class rather than per-instance. To make this
possible, the Reflection interface had to be changed such that all methods
take the Message instance as a parameter. This change improves performance
significantly in memory-bandwidth-limited use cases, since it makes the
message objects smaller. Note that source-incompatible interface changes
like this will not be made again after the library leaves beta.
Python
- MergeFrom(message) and CopyFrom(message) are now implemented.
- SerializeToString() raises an exception if the message is missing required
fields.
- Code organization improvements.
- Fixed doc comments for RpcController and RpcChannel, which had somehow been
swapped.
Protoc (parser)
- Improved error message when an enum value's name conflicts with another
symbol defined in the enum type's scope, e.g. if two enum types declared
in the same scope have values with the same name. This is disallowed for
compatibility with C++, but this wasn't clear from the error.
C++
- Restored the set_foo(const char*) accessor for "bytes" type because some
code inside Google depends on it. However, set_foo(const char*, int) is
still there (and actually is changed to take const void*).
- Fixed TokenizerTest when compiling with -DNDEBUG on Linux.
- Other irrelevant tweaks.
Java
- Fixed UnknownFieldSet's parsing of varints larger than 32 bits.
- Fixed TextFormat's parsing of "inf" and "nan".
- Fixed TextFormat's parsing of comments.
Python
- Fixed text_format_test on Windows where floating-point exponents sometimes
contain extra zeros.
- Improved readmes.
- Fixed incorrect definition of kint32min.
- Fixed absolute output paths on Windows.
- Added info to Java POM that will be required when we upload the
package to a Maven repo.