This corresponds to the newest reading of RFC 3629, and results
in the largest possible number of character entities by any
valid parser. This may result in a buffer which is oversized,
but never undersized.
This is after further discussion with acozzette in this PR;
https://github.com/protocolbuffers/protobuf/pull/6844
Signed-off-by: William A Rowe Jr wrowe@pivotal.io
Signed-off-by: Yechiel Kalmenson ykalmenson@pivotal.io
In consuming this useful string utility, it was discovered
that the interpretation of leading byte codes 0xf8-0xff
did not conform to either the RFC 3629 nor ISO/IEC 10646
definitions of utf-8.
The IETF RFC describes only 1-4 byte encodings (a limited
number of 4 byte encodings at that), and plainly states in
section 1. Introduction;
o The octet values C0, C1, F5 to FF never appear.
Alternately, the ISO definition "R.2 Specification of UTF-8"
preseented in the original IETF RFC 2279 clearly define the
meaning of leading byte values F5 through FD, and RFC 3629
Section 10. Security paragraph 3 calls out this alternate
reading (alterative to "never appears".) F5-F7 begin an
invalid (in the domain of unicode code points) 4-byte UTF-8
sequence (similar to F0-F4), while F8-FC begin a 5-byte
sequence, FC and FD begin a 6 byte sequence.
The curent code is wrong in that it doesn't treat the codes
F8-FF as invalid 1-byte characters, nor does it treat the
codes F8-FD as the correct number of bytes. No valid parser
will land these lead characters 4 bytes forward. Most will
treat these as the 5 or 6 byte utf-32 character and may then
treat the resulting character as invalid, while some parsers
may reject all leading F5-FF characters as a single byte of
erronious input, followed by each invalid continuation byte.
We propose the conventional reading of F8-FD as 5 and 6 byte
sequences as originally defined, while FE-FF must be read
as single byte invalid code points.
Signed-off-by: William A Rowe Jr <wrowe@pivotal.io>
Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>
Plugins (and some built-in generators) have `--<lang>_opt` flag that
allows passing parameters one-by-one instead of passing them as
`--<lang>_out=<params>:<out_base>`. This PR changes protoc to
allow using `--<lang>_opt` for all (built-in) generators.
* Make c extension portable for php 7.4
* Fix conformance tests
* Fix comments
* Fix 32-bit
* Update conformance failure list
* Fix compiler warnings
* Cleanup configure created by phpize
The file created in php 7.4 is not recognizable by previous versions
* Fix conformance tests for 64-bit php
* Fix conformance test
* Fix compile warning
* Fix compile warnings
* Update CHANGES.txt with 3.11.0-RC1 release notes (#6909)
* Revert "Make shared libraries be able to link to MSVC static runtime libraries, so that VC runtime is not required." (#6914)
* Marked update_compatibility_version.py as executable (#6916)
This reverts commit 129a7c875f. We are
seeing the following error when building Python release artifacts in Windows:
" error LNK2038: mismatch detected for 'RuntimeLibrary': value
'MD_DynamicRelease' doesn't match value 'MT_StaticRelease' in
descriptor.obj".