Fixed bug when parsing an unknown value in a proto2 enum extension. #fuzzing

Proto2 enum parsing is the only case where we have to look at the wire value (not merely the tag) to decide whether the field is known or unknown.  If the value is unknown, we need to put the value in the Unknown Fields, but for an extension we no longer have easy access to the message, because for extensions we replace the `msg` pointer with a pointer to the extension.  The bug occurred when we were treating the fake `upb_Message*` (which was actually a pointer to an extension) as a real `upb_Message*` that can have unknown fields.

This CL fixes the problem by preserving the true message pointer in `d->unknown_msg` when we are parsing an extension.

This also required fixing a bug in MiniTable building when fasttables are enabled.  We need to set the table_mask to `-1` to disable fasttable parsing, not `0`.

For unknown reasons, this CL appears to speed up parsing somewhat significantly.  Ideally we should be tracking parsing performance better over time, as it is possible this is merely regaining performance that was lost at a different time:

```
benchy --reference=srcfs third_party/upb/benchmarks:benchmark
 10 / 10 [=================================================================================================================] 100.00% 2m32s
(Generated by http://go/benchy. Settings: --runs 5 --reference "srcfs")

name                                           old cpu/op  new cpu/op  delta
BM_ArenaOneAlloc                                 23.9ns ± 6%  23.7ns ± 4%    ~     (p=0.180 n=53+51)
BM_ArenaInitialBlockOneAlloc                     7.62ns ± 4%  7.70ns ± 5%  +0.99%  (p=0.024 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout>               6.60ms ±10%  6.57ms ± 8%    ~     (p=0.607 n=47+54)
BM_LoadAdsDescriptor_Upb<WithLayout>             6.92ms ± 5%  6.88ms ± 8%    ~     (p=0.257 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>            14.2ms ± 8%  14.0ms ± 7%  -1.38%  (p=0.025 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>          14.3ms ± 8%  14.2ms ± 8%  -1.16%  (p=0.031 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy>            15.9µs ± 4%  14.6µs ± 4%  -7.85%  (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>           14.5µs ± 4%  13.3µs ± 5%  -8.50%  (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>           15.7µs ± 4%  14.4µs ± 5%  -7.99%  (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>          14.2µs ± 5%  13.0µs ± 4%  -8.56%  (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>         26.3µs ± 4%  26.2µs ± 4%    ~     (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>        13.3µs ± 5%  13.2µs ± 4%    ~     (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>       12.9µs ± 4%  12.8µs ± 3%  -0.66%  (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>    10.9µs ± 6%  10.9µs ± 4%    ~     (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2                    7.57µs ± 6%  7.62µs ± 6%    ~     (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb                       12.8µs ± 4%  12.8µs ± 4%    ~     (p=0.163 n=59+56)

name                                           old time/op             new time/op             delta
BM_ArenaOneAlloc                                 23.9ns ± 5%             23.7ns ± 4%    ~           (p=0.172 n=53+51)
BM_ArenaInitialBlockOneAlloc                     7.62ns ± 4%             7.70ns ± 5%  +1.02%        (p=0.017 n=59+60)
BM_LoadAdsDescriptor_Upb<NoLayout>               6.60ms ±10%             6.58ms ± 8%    ~           (p=0.727 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout>             6.92ms ± 5%             6.88ms ± 8%    ~           (p=0.260 n=54+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>            14.2ms ± 7%             14.0ms ± 7%  -1.40%        (p=0.019 n=58+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>          14.3ms ± 8%             14.2ms ± 8%  -1.13%        (p=0.037 n=58+57)
BM_Parse_Upb_FileDesc<UseArena, Copy>            15.9µs ± 4%             14.6µs ± 3%  -7.88%        (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>           14.5µs ± 4%             13.3µs ± 5%  -8.46%        (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>           15.7µs ± 4%             14.4µs ± 5%  -7.99%        (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>          14.2µs ± 5%             13.0µs ± 4%  -8.56%        (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>         26.3µs ± 4%             26.2µs ± 4%    ~           (p=0.224 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>        13.3µs ± 5%             13.2µs ± 4%    ~           (p=0.098 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>       12.9µs ± 4%             12.8µs ± 3%  -0.68%        (p=0.015 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>    10.9µs ± 6%             10.9µs ± 4%    ~           (p=0.052 n=59+58)
BM_SerializeDescriptor_Proto2                    7.56µs ± 6%             7.62µs ± 6%    ~           (p=0.111 n=58+58)
BM_SerializeDescriptor_Upb                       12.8µs ± 4%             12.8µs ± 4%    ~           (p=0.241 n=56+56)

name                                           old allocs/op           new allocs/op           delta
BM_ArenaOneAlloc                                   1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ArenaInitialBlockOneAlloc                       0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout>                5.98k ± 0%              5.98k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout>              5.98k ± 0%              5.98k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout>             80.9k ± 0%              80.9k ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout>           82.1k ± 0%              82.1k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy>              7.00 ± 0%               7.00 ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias>             7.00 ± 0%               7.00 ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy>             0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias>            0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy>            765 ± 0%                765 ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy>          9.00 ± 0%               9.00 ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Proto2                      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Upb                         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)

name                                           old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_ArenaOneAlloc                                    344 ± 0%                344 ± 0%    ~     (all samples are equal)
BM_ArenaInitialBlockOneAlloc                       0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<NoLayout>                9.60M ± 0%              9.60M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Upb<WithLayout>              9.68M ± 0%              9.68M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<NoLayout>             6.41M ± 0%              6.41M ± 0%    ~     (all samples are equal)
BM_LoadAdsDescriptor_Proto2<WithLayout>           6.44M ± 0%              6.44M ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Copy>             36.5k ± 0%              36.5k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<UseArena, Alias>            36.5k ± 0%              36.5k ± 0%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Copy>             0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Upb_FileDesc<InitBlock, Alias>            0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, NoArena, Copy>          35.8k ± 0%              35.8k ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, UseArena, Copy>         40.7k ± 0%              40.7k ± 0%    ~     (all samples are equal)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Proto2                      0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_SerializeDescriptor_Upb                         0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)

name                                           old speed               new speed               delta
BM_LoadAdsDescriptor_Upb<NoLayout>              113MB/s ± 9%            113MB/s ± 8%    ~           (p=0.712 n=47+55)
BM_LoadAdsDescriptor_Upb<WithLayout>            107MB/s ± 8%            108MB/s ± 8%    ~           (p=0.200 n=55+54)
BM_LoadAdsDescriptor_Proto2<NoLayout>          52.5MB/s ± 8%           53.3MB/s ± 7%  +1.51%        (p=0.018 n=59+59)
BM_LoadAdsDescriptor_Proto2<WithLayout>        51.9MB/s ± 7%           52.4MB/s ± 8%  +1.01%        (p=0.050 n=58+58)
BM_Parse_Upb_FileDesc<UseArena, Copy>           473MB/s ± 4%            514MB/s ± 4%  +8.52%        (p=0.000 n=57+59)
BM_Parse_Upb_FileDesc<UseArena, Alias>          518MB/s ± 4%            566MB/s ± 5%  +9.30%        (p=0.000 n=57+60)
BM_Parse_Upb_FileDesc<InitBlock, Copy>          480MB/s ± 4%            521MB/s ± 5%  +8.69%        (p=0.000 n=59+60)
BM_Parse_Upb_FileDesc<InitBlock, Alias>         528MB/s ± 4%            578MB/s ± 4%  +9.36%        (p=0.000 n=57+58)
BM_Parse_Proto2<FileDesc, NoArena, Copy>        286MB/s ± 4%            287MB/s ± 4%    ~           (p=0.195 n=55+53)
BM_Parse_Proto2<FileDesc, UseArena, Copy>       566MB/s ± 5%            570MB/s ± 4%    ~           (p=0.085 n=59+59)
BM_Parse_Proto2<FileDesc, InitBlock, Copy>      583MB/s ± 5%            587MB/s ± 3%  +0.64%        (p=0.023 n=60+58)
BM_Parse_Proto2<FileDescSV, InitBlock, Alias>   688MB/s ± 6%            693MB/s ± 4%    ~           (p=0.063 n=59+58)
BM_SerializeDescriptor_Proto2                   995MB/s ± 6%            988MB/s ± 5%    ~           (p=0.147 n=57+58)
BM_SerializeDescriptor_Upb                      586MB/s ± 4%            589MB/s ± 4%    ~           (p=0.163 n=59+56)
```

PiperOrigin-RevId: 462022073
pull/13171/head
Joshua Haberman 3 years ago committed by Copybara-Service
parent 48d6764490
commit ececc21624
  1. 8
      upb/decode.c
  2. 6
      upb/internal/decode.h
  3. 6
      upb/mini_table.c
  4. 10
      upb/msg_test.cc

@ -421,7 +421,9 @@ static bool decode_checkenum_slow(upb_Decoder* d, const char* ptr,
// For packed fields the tag could be arbitrarily far in the past, so we
// just re-encode the tag and value here.
uint32_t tag = ((uint32_t)field->number << 3) | kUpb_WireType_Varint;
upb_Decode_AddUnknownVarints(d, msg, tag, v);
upb_Message* unknown_msg =
field->mode & kUpb_LabelFlags_IsExtension ? d->unknown_msg : msg;
upb_Decode_AddUnknownVarints(d, unknown_msg, tag, v);
return false;
}
@ -1008,6 +1010,7 @@ static const char* decode_known(upb_Decoder* d, const char* ptr,
upb_Message_Extension* ext =
_upb_Message_GetOrCreateExtension(msg, ext_layout, &d->arena);
if (UPB_UNLIKELY(!ext)) return decode_err(d, kUpb_DecodeStatus_OutOfMemory);
d->unknown_msg = msg;
msg = &ext->data;
subs = &ext->ext->sub;
}
@ -1073,7 +1076,6 @@ static const char* decode_unknown(upb_Decoder* d, const char* ptr,
d->unknown_msg = msg;
ptr = decode_group(d, ptr, NULL, NULL, field_number);
start = d->unknown;
d->unknown_msg = NULL;
d->unknown = NULL;
}
if (!_upb_Message_AddUnknown(msg, start, ptr - start, &d->arena)) {
@ -1189,7 +1191,7 @@ upb_DecodeStatus upb_Decode(const char* buf, size_t size, void* msg,
state.extreg = extreg;
state.limit_ptr = state.end;
state.unknown_msg = NULL;
state.unknown = NULL;
state.depth = depth ? depth : 64;
state.end_group = DECODE_NOGROUP;
state.options = (uint16_t)options;

@ -46,8 +46,8 @@
typedef struct upb_Decoder {
const char* end; /* Can read up to 16 bytes slop beyond this. */
const char* limit_ptr; /* = end + UPB_MIN(limit, 0) */
upb_Message* unknown_msg; /* If non-NULL, add unknown data at buffer flip. */
const char* unknown; /* Start of unknown data. */
upb_Message* unknown_msg; /* Used for preserving unknown data. */
const char* unknown; /* Start of unknown data, preserve at buffer flip. */
const upb_ExtensionRegistry*
extreg; /* For looking up extensions during the parse. */
int limit; /* Submessage limit relative to end. */
@ -120,7 +120,7 @@ const char* decode_isdonefallback_inl(upb_Decoder* d, const char* ptr,
if (overrun < d->limit) {
/* Need to copy remaining data into patch buffer. */
UPB_ASSERT(overrun < 16);
if (d->unknown_msg) {
if (d->unknown) {
if (!_upb_Message_AddUnknown(d->unknown_msg, d->unknown, ptr - d->unknown,
&d->arena)) {
*status = kUpb_DecodeStatus_OutOfMemory;

@ -963,7 +963,7 @@ upb_MiniTable* upb_MiniTable_BuildWithBuf(const char* data, size_t len,
decoder.table->field_count = 0;
decoder.table->ext = kUpb_ExtMode_NonExtendable;
decoder.table->dense_below = 0;
decoder.table->table_mask = 0;
decoder.table->table_mask = -1;
decoder.table->required_count = 0;
upb_MtDecoder_ParseMessage(&decoder, data, len);
@ -986,7 +986,7 @@ upb_MiniTable* upb_MiniTable_BuildMessageSet(upb_MiniTablePlatform platform,
ret->field_count = 0;
ret->ext = kUpb_ExtMode_IsMessageSet;
ret->dense_below = 0;
ret->table_mask = 0;
ret->table_mask = -1;
ret->required_count = 0;
return ret;
}
@ -1027,7 +1027,7 @@ upb_MiniTable* upb_MiniTable_BuildMapEntry(upb_FieldType key_type,
ret->field_count = 2;
ret->ext = kUpb_ExtMode_NonExtendable | kUpb_ExtMode_IsMapEntry;
ret->dense_below = 2;
ret->table_mask = 0;
ret->table_mask = -1;
ret->required_count = 0;
ret->subs = subs;
ret->fields = fields;

@ -515,4 +515,14 @@ TEST(MessageTest, MapField) {
// }
// FUZZ_TEST(FuzzTest, DecodeEncodeArbitrarySchemaAndPayload);
//
// TEST(FuzzTest, DecodeEncodeArbitrarySchemaAndPayloadRegression) {
// DecodeEncodeArbitrarySchemaAndPayload(
// {{"\256\354Rt\216\3271\234", "\243\243\267\207\336gV\366w"},
// {"z"},
// "}\212\304d\371\363\341\2329\325B\264\377?\215\223\201\201\226y\201%"
// "\321\363\255;",
// {}},
// "\010", -724543908, -591643538);
// }
//
// end:google_only

Loading…
Cancel
Save