Clang currently displays an error if source files generated with protoc are compiled with -Wcomma. This change fixes this as suggested by the compiler itself.
This corresponds to the newest reading of RFC 3629, and results
in the largest possible number of character entities by any
valid parser. This may result in a buffer which is oversized,
but never undersized.
This is after further discussion with acozzette in this PR;
https://github.com/protocolbuffers/protobuf/pull/6844
Signed-off-by: William A Rowe Jr wrowe@pivotal.io
Signed-off-by: Yechiel Kalmenson ykalmenson@pivotal.io
In consuming this useful string utility, it was discovered
that the interpretation of leading byte codes 0xf8-0xff
did not conform to either the RFC 3629 nor ISO/IEC 10646
definitions of utf-8.
The IETF RFC describes only 1-4 byte encodings (a limited
number of 4 byte encodings at that), and plainly states in
section 1. Introduction;
o The octet values C0, C1, F5 to FF never appear.
Alternately, the ISO definition "R.2 Specification of UTF-8"
preseented in the original IETF RFC 2279 clearly define the
meaning of leading byte values F5 through FD, and RFC 3629
Section 10. Security paragraph 3 calls out this alternate
reading (alterative to "never appears".) F5-F7 begin an
invalid (in the domain of unicode code points) 4-byte UTF-8
sequence (similar to F0-F4), while F8-FC begin a 5-byte
sequence, FC and FD begin a 6 byte sequence.
The curent code is wrong in that it doesn't treat the codes
F8-FF as invalid 1-byte characters, nor does it treat the
codes F8-FD as the correct number of bytes. No valid parser
will land these lead characters 4 bytes forward. Most will
treat these as the 5 or 6 byte utf-32 character and may then
treat the resulting character as invalid, while some parsers
may reject all leading F5-FF characters as a single byte of
erronious input, followed by each invalid continuation byte.
We propose the conventional reading of F8-FD as 5 and 6 byte
sequences as originally defined, while FE-FF must be read
as single byte invalid code points.
Signed-off-by: William A Rowe Jr <wrowe@pivotal.io>
Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>
Plugins (and some built-in generators) have `--<lang>_opt` flag that
allows passing parameters one-by-one instead of passing them as
`--<lang>_out=<params>:<out_base>`. This PR changes protoc to
allow using `--<lang>_opt` for all (built-in) generators.
* Register additional handlers from wrappers
* Return zval instead of parse frame
* Use parse frame
* Update upb
* Lazily create wrapper messages
* Fix a segment fault
Need check type of field before getting submsg def
* Avoid expanding during serialization and direct access
* Fix a bug that getXXXUnwrapped returns null for string
* Implement writeWrapperUnwrapped
* Add more tests
* Fix oneof wrapper parsing
* Fix get oneof field
* Avoid expansion for oneof wrappers
* Fix bug
* Fix a bug that in php7 variable is defined out of scope
* Fix broken tests
* Update upb to fix Timestamp conformance tests
* Fix segmentation fault for oneof wrapper fields
* Fix encoding/decoding top level wrapper values
* Add type checking for write wrapper value in php7
* Fix zts build
* Fix the bug that readWrapperValue uses parent message's layout to access wrapper value
* Fix wrapper in map
When browsing around the strutil files I found a function
that was never referenced inside the code base
"void StripString(string* s, const char* remove,
- char replacewith);"
The name was kind of misleading as well and it seems like
it's a carbon copy of
"void ReplaceCharacters(string* s, const char* remove,
char replacewith);"
(even the parameter names are the same, the code is the same..)
Is it intentional? Maybe for compatibility reasons? If so,
let's make it deprecated and use the ReplaceCharacters method inside
or the other way around.
Also, noticed there were no tests for "StripString" or "Replace".
Added some for both and planning on maybe making it more C++ish (?)
in another commit.