Callers must always over-allocate their buffer by at least
ten bytes. Since we will never read *more* than ten bytes,
there is no need to do bounds checking inside the parsing
code.
All Languages
* Repeated fields of primitive types (types other that string, group, and
nested messages) may now use the option [packed = true] to get a more
efficient encoding. In the new encoding, the entire list is written
as a single byte blob using the "length-delimited" wire type. Within
this blob, the individual values are encoded the same way they would
be normally except without a tag before each value (thus, they are
tightly "packed").
C++
* UnknownFieldSet now supports STL-like iteration.
* Message interface has method ParseFromBoundedZeroCopyStream() which parses
a limited number of bytes from an input stream rather than parsing until
EOF.
Java
* Fixed bug where Message.mergeFrom(Message) failed to merge extensions.
* Message interface has new method toBuilder() which is equivalent to
newBuilderForType().mergeFrom(this).
* All enums now implement the ProtocolMessageEnum interface.
* Setting a field to null now throws NullPointerException.
* Fixed tendency for TextFormat's parsing to overflow the stack when
parsing large string values. The underlying problem is with Java's
regex implementation (which unfortunately uses recursive backtracking
rather than building an NFA). Worked around by making use of possesive
quantifiers.
Python
* Updated RPC interfaces to allow for blocking operation. A client may
now pass None for a callback when making an RPC, in which case the
call will block until the response is received, and the response
object will be returned directly to the caller. This interface change
cannot be used in practice until RPC implementations are updated to
implement it.
bash-only features, and /bin/sh is not a symlink to bash on all systems.
* If an input file is a Windows absolute path (e.g. "C:\foo\bar.proto") and
the import path only contains "." (or contains "." but does not contain
the file), protoc incorrectly thought that the file was under ".", because
it thought that the path was relative (since it didn't start with a slash).
This has been fixed.
protoc
* Enum values may now have custom options, using syntax similar to field
options.
* Fixed bug where .proto files which use custom options but don't actually
define them (i.e. they import another .proto file defining the options)
had to explicitly import descriptor.proto.
* Adjacent string literals in .proto files will now be concatenated, like in
C.
C++
* Generated message classes now have a Swap() method which efficiently swaps
the contents of two objects.
* All message classes now have a SpaceUsed() method which returns an estimate
of the number of bytes of allocated memory currently owned by the object.
This is particularly useful when you are reusing a single message object
to improve performance but want to make sure it doesn't bloat up too large.
* New method Message::SerializeAsString() returns a string containing the
serialized data. May be more convenient than calling
SerializeToString(string*).
* In debug mode, log error messages when string-type fields are found to
contain bytes that are not valid UTF-8.
* Fixed bug where a message with multiple extension ranges couldn't parse
extensions.
* Fixed bug where MergeFrom(const Message&) didn't do anything if invoked on
a message that contained no fields (but possibly contained extensions).
* Fixed ShortDebugString() to not be O(n^2). Durr.
* Fixed crash in TextFormat parsing if the first token in the input caused a
tokenization error.
Java
* New overload of mergeFrom() which parses a slice of a byte array instead
of the whole thing.
* New method ByteString.asReadOnlyByteBuffer() does what it sounds like.
* Improved performance of isInitialized() when optimizing for code size.
Python
* Corrected ListFields() signature in Message base class to match what
subclasses actually implement.
* Some minor refactoring.
require all memory reads to be aligned. Specifically, it turns out that
sizeof(RepeatedField<bool>) is 20 on 64-bit sparc with GCC 3.4.6. This is
strange, since one of RepeatedField's members is a pointer, which I thought
meant that it had to be 64-bit aligned, which means its size should be a
multiple of 64 bits. But, 20 is not a multiple of 8. I don't understand why
this is the case, but if this is possible, then DynamicMessage's strategy of
sorting fields in descending order by size and then tightly packing doesn't
work. To fix this, I got rid of the sort step and instead added code that
aligns each field's offset appropriately based on the field's size.
Also in this revision: Fix an error message that named a flag incorrectly.
Details:
For each message type, protoc generates an array of byte offsets of each of
the fields within the message class. These offsets are later used by the
reflection implementation. Prior to this revision, the offset arrays were
allocated as global variables. Since they were just arrays of ints, they
should have been initialized at compile time. Unfortunately, GCC 4.3.0
incorrectly decides that they cannot be initialized at compile time because
the values used to initialize the array have type ptrdiff_t, and GCC 4.3.0
does not recognize that it can convert ptrdiff_t to int at compile time. This
bug did not seem to exist in previous versions of GCC. Google's compiler
team has submitted a fix for this bug back to the GCC project, but we will
have to work around it anyway since Fedora 9 shipped with GCC 4.3.0.