In consuming this useful string utility, it was discovered
that the interpretation of leading byte codes 0xf8-0xff
did not conform to either the RFC 3629 nor ISO/IEC 10646
definitions of utf-8.
The IETF RFC describes only 1-4 byte encodings (a limited
number of 4 byte encodings at that), and plainly states in
section 1. Introduction;
o The octet values C0, C1, F5 to FF never appear.
Alternately, the ISO definition "R.2 Specification of UTF-8"
preseented in the original IETF RFC 2279 clearly define the
meaning of leading byte values F5 through FD, and RFC 3629
Section 10. Security paragraph 3 calls out this alternate
reading (alterative to "never appears".) F5-F7 begin an
invalid (in the domain of unicode code points) 4-byte UTF-8
sequence (similar to F0-F4), while F8-FC begin a 5-byte
sequence, FC and FD begin a 6 byte sequence.
The curent code is wrong in that it doesn't treat the codes
F8-FF as invalid 1-byte characters, nor does it treat the
codes F8-FD as the correct number of bytes. No valid parser
will land these lead characters 4 bytes forward. Most will
treat these as the 5 or 6 byte utf-32 character and may then
treat the resulting character as invalid, while some parsers
may reject all leading F5-FF characters as a single byte of
erronious input, followed by each invalid continuation byte.
We propose the conventional reading of F8-FD as 5 and 6 byte
sequences as originally defined, while FE-FF must be read
as single byte invalid code points.
Signed-off-by: William A Rowe Jr <wrowe@pivotal.io>
Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>
Plugins (and some built-in generators) have `--<lang>_opt` flag that
allows passing parameters one-by-one instead of passing them as
`--<lang>_out=<params>:<out_base>`. This PR changes protoc to
allow using `--<lang>_opt` for all (built-in) generators.
* Update CHANGES.txt with 3.11.0-RC1 release notes (#6909)
* Revert "Make shared libraries be able to link to MSVC static runtime libraries, so that VC runtime is not required." (#6914)
* Marked update_compatibility_version.py as executable (#6916)
* Make reserve names map persistent
* Add DescriptorInternal to map
* Use get_msgdef_desc in encode_decode.c
* Add persistent map for ce=>def and enum=>def
* Replace get_ce_obj
* Remove get_proto_obj
* Remove obsolete fields from Descriptor and EnumDescriptor
* Add cache for descriptor php values
* Add cache for descriptors
* Fix bug
* Avoid add generated file again if it has been added
* Fix the bug upb depends on null-ended str for look up.
* Initialize generated pool impl
* Turn down old generated pool
* Add init entry flag protobuf.keep_descriptor_pool_after_request
By default, it's off. Add protobuf.keep_descriptor_pool_after_request=1 to php.ini to enable it
* Fix zts build
* Register additional handlers from wrappers
* Return zval instead of parse frame
* Use parse frame
* Update upb
* Lazily create wrapper messages
* Fix a segment fault
Need check type of field before getting submsg def
* Avoid expanding during serialization and direct access
* Fix a bug that getXXXUnwrapped returns null for string
* Implement writeWrapperUnwrapped
* Add more tests
* Fix oneof wrapper parsing
* Fix get oneof field
* Avoid expansion for oneof wrappers
* Fix bug
* Fix a bug that in php7 variable is defined out of scope
* Fix broken tests
* Update upb to fix Timestamp conformance tests
* Fix segmentation fault for oneof wrapper fields
* Fix encoding/decoding top level wrapper values
* Add type checking for write wrapper value in php7
* Fix zts build
* Fix the bug that readWrapperValue uses parent message's layout to access wrapper value
* Fix wrapper in map