Treat F5-FF octets as single (invalid) characters

This corresponds to the newest reading of RFC 3629, and results
in the largest possible number of character entities by any
valid parser. This may result in a buffer which is oversized,
but never undersized.

This is after further discussion with acozzette in this PR;
https://github.com/protocolbuffers/protobuf/pull/6844

Signed-off-by: William A Rowe Jr wrowe@pivotal.io
Signed-off-by: Yechiel Kalmenson ykalmenson@pivotal.io
pull/7011/head
William A Rowe Jr 5 years ago committed by Adam Cozzette
parent 961c0e6b86
commit 53a814a0ee
  1. 2
      src/google/protobuf/stubs/strutil.cc

@ -2292,7 +2292,7 @@ static const unsigned char kUTF8LenTbl[256] = {
1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3, 3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4, 5,5,5,5,6,6,1,1
3,3,3,3,3,3,3,3, 3,3,3,3,3,3,3,3, 4,4,4,4,4,1,1,1, 1,1,1,1,1,1,1,1
};
// Return length of a single UTF-8 source character

Loading…
Cancel
Save