Optimized decoder and paved the way for parsing extensions.

The primary motivation for this change is to avoid referring to the
`upb_msglayout` object when we are trying to fetch the `upb_msglayout`
object for a sub-message. This will help pave the way for parsing
extensions. We also implement several optimizations so that we can
make this change without regressing performance.

Normally we compute the layout for a sub-message field like so:

```
const upb_msglayout *get_submsg_layout(
    const upb_msglayout *layout,
    const upb_msglayout_field *field) {
  return layout->submsgs[field->submsg_index]
}
```

The reason for this indirection is to avoid storing a pointer directly
in `upb_msglayout_field`, as this would double its size (from 12 to 24
bytes on 64-bit architectures) which is wasteful as this pointer is
only needed for message typed fields.

However `get_submsg_layout` as written above does not work for
extensions, as they will not have entries in the message's
`layout->submsgs` array by nature, and we want to avoid creating
an entire fake `upb_msglayout` for each such extension since that
would also be wasteful.

This change removes the dependency on `upb_msglayout` by passing down
the `submsgs` array instead:

```
const upb_msglayout *get_submsg_layout(
    const upb_msglayout *const *submsgs,
    const upb_msglayout_field *field) {
  return submsgs[field->submsg_index]
}
```

This will pave the way for parsing extensions, as we can more easily
create an alternative `submsgs` array for extension fields without
extra overhead or waste.

Along the way several optimizations presented themselves that allow
a nice increase in performance:

1. Passing the parsed `wireval` by address instead of by value ended
   up avoiding an expensive and useless stack copy (this is on Clang,
   which was used for all measurements).
2. When field numbers are densely packed, we can find a field by number
   with a single indexed lookup instead of linear search. At codegen
   time we can compute the maximum field number that will allow such
   an indexed lookup.
3. For fields that do require linear search, we can start the linear
   search at the location where we found the previous field, taking
   advantage of the fact that field numbers are generally increasing.
4. When the hasbit index is less than 32 (the common case) we can use
   a less expensive code sequence to set it.
5. We check for the hasbit case before the oneof case, as optional
   fields are more common than oneof fields.

Benchmark results indicate a 20% improvement in parse speed with a
small code size increase:

```
name                                      old time/op  new time/op  delta
ArenaOneAlloc                             21.3ns ± 0%  21.5ns ± 0%   +0.96%  (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc                 6.32ns ± 0%  6.32ns ± 0%   +0.03%  (p=0.000 n=12+10)
LoadDescriptor_Upb                        53.5µs ± 1%  51.5µs ± 2%   -3.70%  (p=0.000 n=12+12)
LoadAdsDescriptor_Upb                     2.78ms ± 2%  2.68ms ± 0%   -3.57%  (p=0.000 n=12+12)
LoadDescriptor_Proto2                      240µs ± 0%   240µs ± 0%   +0.12%  (p=0.001 n=12+12)
LoadAdsDescriptor_Proto2                  12.8ms ± 0%  12.7ms ± 0%   -1.15%  (p=0.000 n=12+10)
Parse_Upb_FileDesc<UseArena,Copy>         13.2µs ± 2%  10.7µs ± 0%  -18.49%  (p=0.000 n=10+12)
Parse_Upb_FileDesc<UseArena,Alias>        11.3µs ± 0%   9.6µs ± 0%  -15.11%  (p=0.000 n=12+11)
Parse_Upb_FileDesc<InitBlock,Copy>        12.7µs ± 0%  10.3µs ± 0%  -19.00%  (p=0.000 n=10+12)
Parse_Upb_FileDesc<InitBlock,Alias>       10.9µs ± 0%   9.2µs ± 0%  -15.82%  (p=0.000 n=12+12)
Parse_Proto2<FileDesc,NoArena,Copy>       29.4µs ± 0%  29.5µs ± 0%   +0.61%  (p=0.000 n=12+12)
Parse_Proto2<FileDesc,UseArena,Copy>      20.7µs ± 2%  20.6µs ± 2%     ~     (p=0.260 n=12+11)
Parse_Proto2<FileDesc,InitBlock,Copy>     16.7µs ± 1%  16.7µs ± 0%   -0.25%  (p=0.036 n=12+10)
Parse_Proto2<FileDescSV,InitBlock,Alias>  16.5µs ± 0%  16.5µs ± 0%   +0.20%  (p=0.016 n=12+11)
SerializeDescriptor_Proto2                5.30µs ± 1%  5.36µs ± 1%   +1.09%  (p=0.000 n=12+11)
SerializeDescriptor_Upb                   12.9µs ± 0%  13.0µs ± 0%   +0.90%  (p=0.000 n=12+11)

    FILE SIZE        VM SIZE
 --------------  --------------
  +1.5%    +176  +1.6%    +176    upb/decode.c
    +1.8%    +176  +1.9%    +176    decode_msg
  +0.4%     +64  +0.4%     +64    upb/def.c
    +1.4%     +64  +1.4%     +64    _upb_symtab_addfile
  +1.2%     +48  +1.4%     +48    upb/reflection.c
     +15%     +32   +18%     +32    upb_msg_set
    +2.9%     +16  +3.1%     +16    upb_msg_mutable
  -9.3%    -288  [ = ]       0    [Unmapped]
  [ = ]       0  +0.2%    +288    TOTAL
```
pull/13171/head
Joshua Haberman 4 years ago
parent 3e035cb553
commit 65d7b8ab0c
  1. 54
      cmake/google/protobuf/descriptor.upb.c
  2. 115
      upb/decode.c
  3. 10
      upb/def.c
  4. 7
      upb/msg_internal.h
  5. 10
      upbc/protoc-gen-upb.cc

@ -23,7 +23,7 @@ static const upb_msglayout_field google_protobuf_FileDescriptorSet__fields[1] =
const upb_msglayout google_protobuf_FileDescriptorSet_msginit = { const upb_msglayout google_protobuf_FileDescriptorSet_msginit = {
&google_protobuf_FileDescriptorSet_submsgs[0], &google_protobuf_FileDescriptorSet_submsgs[0],
&google_protobuf_FileDescriptorSet__fields[0], &google_protobuf_FileDescriptorSet__fields[0],
UPB_SIZE(8, 8), 1, false, 255, UPB_SIZE(8, 8), 1, false, 1, 255,
}; };
static const upb_msglayout *const google_protobuf_FileDescriptorProto_submsgs[6] = { static const upb_msglayout *const google_protobuf_FileDescriptorProto_submsgs[6] = {
@ -53,7 +53,7 @@ static const upb_msglayout_field google_protobuf_FileDescriptorProto__fields[12]
const upb_msglayout google_protobuf_FileDescriptorProto_msginit = { const upb_msglayout google_protobuf_FileDescriptorProto_msginit = {
&google_protobuf_FileDescriptorProto_submsgs[0], &google_protobuf_FileDescriptorProto_submsgs[0],
&google_protobuf_FileDescriptorProto__fields[0], &google_protobuf_FileDescriptorProto__fields[0],
UPB_SIZE(64, 128), 12, false, 255, UPB_SIZE(64, 128), 12, false, 12, 255,
}; };
static const upb_msglayout *const google_protobuf_DescriptorProto_submsgs[7] = { static const upb_msglayout *const google_protobuf_DescriptorProto_submsgs[7] = {
@ -82,7 +82,7 @@ static const upb_msglayout_field google_protobuf_DescriptorProto__fields[10] = {
const upb_msglayout google_protobuf_DescriptorProto_msginit = { const upb_msglayout google_protobuf_DescriptorProto_msginit = {
&google_protobuf_DescriptorProto_submsgs[0], &google_protobuf_DescriptorProto_submsgs[0],
&google_protobuf_DescriptorProto__fields[0], &google_protobuf_DescriptorProto__fields[0],
UPB_SIZE(48, 96), 10, false, 255, UPB_SIZE(48, 96), 10, false, 10, 255,
}; };
static const upb_msglayout *const google_protobuf_DescriptorProto_ExtensionRange_submsgs[1] = { static const upb_msglayout *const google_protobuf_DescriptorProto_ExtensionRange_submsgs[1] = {
@ -98,7 +98,7 @@ static const upb_msglayout_field google_protobuf_DescriptorProto_ExtensionRange_
const upb_msglayout google_protobuf_DescriptorProto_ExtensionRange_msginit = { const upb_msglayout google_protobuf_DescriptorProto_ExtensionRange_msginit = {
&google_protobuf_DescriptorProto_ExtensionRange_submsgs[0], &google_protobuf_DescriptorProto_ExtensionRange_submsgs[0],
&google_protobuf_DescriptorProto_ExtensionRange__fields[0], &google_protobuf_DescriptorProto_ExtensionRange__fields[0],
UPB_SIZE(16, 24), 3, false, 255, UPB_SIZE(16, 24), 3, false, 3, 255,
}; };
static const upb_msglayout_field google_protobuf_DescriptorProto_ReservedRange__fields[2] = { static const upb_msglayout_field google_protobuf_DescriptorProto_ReservedRange__fields[2] = {
@ -109,7 +109,7 @@ static const upb_msglayout_field google_protobuf_DescriptorProto_ReservedRange__
const upb_msglayout google_protobuf_DescriptorProto_ReservedRange_msginit = { const upb_msglayout google_protobuf_DescriptorProto_ReservedRange_msginit = {
NULL, NULL,
&google_protobuf_DescriptorProto_ReservedRange__fields[0], &google_protobuf_DescriptorProto_ReservedRange__fields[0],
UPB_SIZE(16, 16), 2, false, 255, UPB_SIZE(16, 16), 2, false, 2, 255,
}; };
static const upb_msglayout *const google_protobuf_ExtensionRangeOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_ExtensionRangeOptions_submsgs[1] = {
@ -123,7 +123,7 @@ static const upb_msglayout_field google_protobuf_ExtensionRangeOptions__fields[1
const upb_msglayout google_protobuf_ExtensionRangeOptions_msginit = { const upb_msglayout google_protobuf_ExtensionRangeOptions_msginit = {
&google_protobuf_ExtensionRangeOptions_submsgs[0], &google_protobuf_ExtensionRangeOptions_submsgs[0],
&google_protobuf_ExtensionRangeOptions__fields[0], &google_protobuf_ExtensionRangeOptions__fields[0],
UPB_SIZE(8, 8), 1, false, 255, UPB_SIZE(8, 8), 1, false, 0, 255,
}; };
static const upb_msglayout *const google_protobuf_FieldDescriptorProto_submsgs[1] = { static const upb_msglayout *const google_protobuf_FieldDescriptorProto_submsgs[1] = {
@ -147,7 +147,7 @@ static const upb_msglayout_field google_protobuf_FieldDescriptorProto__fields[11
const upb_msglayout google_protobuf_FieldDescriptorProto_msginit = { const upb_msglayout google_protobuf_FieldDescriptorProto_msginit = {
&google_protobuf_FieldDescriptorProto_submsgs[0], &google_protobuf_FieldDescriptorProto_submsgs[0],
&google_protobuf_FieldDescriptorProto__fields[0], &google_protobuf_FieldDescriptorProto__fields[0],
UPB_SIZE(72, 112), 11, false, 255, UPB_SIZE(72, 112), 11, false, 10, 255,
}; };
static const upb_msglayout *const google_protobuf_OneofDescriptorProto_submsgs[1] = { static const upb_msglayout *const google_protobuf_OneofDescriptorProto_submsgs[1] = {
@ -162,7 +162,7 @@ static const upb_msglayout_field google_protobuf_OneofDescriptorProto__fields[2]
const upb_msglayout google_protobuf_OneofDescriptorProto_msginit = { const upb_msglayout google_protobuf_OneofDescriptorProto_msginit = {
&google_protobuf_OneofDescriptorProto_submsgs[0], &google_protobuf_OneofDescriptorProto_submsgs[0],
&google_protobuf_OneofDescriptorProto__fields[0], &google_protobuf_OneofDescriptorProto__fields[0],
UPB_SIZE(16, 32), 2, false, 255, UPB_SIZE(16, 32), 2, false, 2, 255,
}; };
static const upb_msglayout *const google_protobuf_EnumDescriptorProto_submsgs[3] = { static const upb_msglayout *const google_protobuf_EnumDescriptorProto_submsgs[3] = {
@ -182,7 +182,7 @@ static const upb_msglayout_field google_protobuf_EnumDescriptorProto__fields[5]
const upb_msglayout google_protobuf_EnumDescriptorProto_msginit = { const upb_msglayout google_protobuf_EnumDescriptorProto_msginit = {
&google_protobuf_EnumDescriptorProto_submsgs[0], &google_protobuf_EnumDescriptorProto_submsgs[0],
&google_protobuf_EnumDescriptorProto__fields[0], &google_protobuf_EnumDescriptorProto__fields[0],
UPB_SIZE(32, 64), 5, false, 255, UPB_SIZE(32, 64), 5, false, 5, 255,
}; };
static const upb_msglayout_field google_protobuf_EnumDescriptorProto_EnumReservedRange__fields[2] = { static const upb_msglayout_field google_protobuf_EnumDescriptorProto_EnumReservedRange__fields[2] = {
@ -193,7 +193,7 @@ static const upb_msglayout_field google_protobuf_EnumDescriptorProto_EnumReserve
const upb_msglayout google_protobuf_EnumDescriptorProto_EnumReservedRange_msginit = { const upb_msglayout google_protobuf_EnumDescriptorProto_EnumReservedRange_msginit = {
NULL, NULL,
&google_protobuf_EnumDescriptorProto_EnumReservedRange__fields[0], &google_protobuf_EnumDescriptorProto_EnumReservedRange__fields[0],
UPB_SIZE(16, 16), 2, false, 255, UPB_SIZE(16, 16), 2, false, 2, 255,
}; };
static const upb_msglayout *const google_protobuf_EnumValueDescriptorProto_submsgs[1] = { static const upb_msglayout *const google_protobuf_EnumValueDescriptorProto_submsgs[1] = {
@ -209,7 +209,7 @@ static const upb_msglayout_field google_protobuf_EnumValueDescriptorProto__field
const upb_msglayout google_protobuf_EnumValueDescriptorProto_msginit = { const upb_msglayout google_protobuf_EnumValueDescriptorProto_msginit = {
&google_protobuf_EnumValueDescriptorProto_submsgs[0], &google_protobuf_EnumValueDescriptorProto_submsgs[0],
&google_protobuf_EnumValueDescriptorProto__fields[0], &google_protobuf_EnumValueDescriptorProto__fields[0],
UPB_SIZE(24, 32), 3, false, 255, UPB_SIZE(24, 32), 3, false, 3, 255,
}; };
static const upb_msglayout *const google_protobuf_ServiceDescriptorProto_submsgs[2] = { static const upb_msglayout *const google_protobuf_ServiceDescriptorProto_submsgs[2] = {
@ -226,7 +226,7 @@ static const upb_msglayout_field google_protobuf_ServiceDescriptorProto__fields[
const upb_msglayout google_protobuf_ServiceDescriptorProto_msginit = { const upb_msglayout google_protobuf_ServiceDescriptorProto_msginit = {
&google_protobuf_ServiceDescriptorProto_submsgs[0], &google_protobuf_ServiceDescriptorProto_submsgs[0],
&google_protobuf_ServiceDescriptorProto__fields[0], &google_protobuf_ServiceDescriptorProto__fields[0],
UPB_SIZE(24, 48), 3, false, 255, UPB_SIZE(24, 48), 3, false, 3, 255,
}; };
static const upb_msglayout *const google_protobuf_MethodDescriptorProto_submsgs[1] = { static const upb_msglayout *const google_protobuf_MethodDescriptorProto_submsgs[1] = {
@ -245,7 +245,7 @@ static const upb_msglayout_field google_protobuf_MethodDescriptorProto__fields[6
const upb_msglayout google_protobuf_MethodDescriptorProto_msginit = { const upb_msglayout google_protobuf_MethodDescriptorProto_msginit = {
&google_protobuf_MethodDescriptorProto_submsgs[0], &google_protobuf_MethodDescriptorProto_submsgs[0],
&google_protobuf_MethodDescriptorProto__fields[0], &google_protobuf_MethodDescriptorProto__fields[0],
UPB_SIZE(32, 64), 6, false, 255, UPB_SIZE(32, 64), 6, false, 6, 255,
}; };
static const upb_msglayout *const google_protobuf_FileOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_FileOptions_submsgs[1] = {
@ -279,7 +279,7 @@ static const upb_msglayout_field google_protobuf_FileOptions__fields[21] = {
const upb_msglayout google_protobuf_FileOptions_msginit = { const upb_msglayout google_protobuf_FileOptions_msginit = {
&google_protobuf_FileOptions_submsgs[0], &google_protobuf_FileOptions_submsgs[0],
&google_protobuf_FileOptions__fields[0], &google_protobuf_FileOptions__fields[0],
UPB_SIZE(104, 192), 21, false, 255, UPB_SIZE(104, 192), 21, false, 1, 255,
}; };
static const upb_msglayout *const google_protobuf_MessageOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_MessageOptions_submsgs[1] = {
@ -297,7 +297,7 @@ static const upb_msglayout_field google_protobuf_MessageOptions__fields[5] = {
const upb_msglayout google_protobuf_MessageOptions_msginit = { const upb_msglayout google_protobuf_MessageOptions_msginit = {
&google_protobuf_MessageOptions_submsgs[0], &google_protobuf_MessageOptions_submsgs[0],
&google_protobuf_MessageOptions__fields[0], &google_protobuf_MessageOptions__fields[0],
UPB_SIZE(16, 16), 5, false, 255, UPB_SIZE(16, 16), 5, false, 3, 255,
}; };
static const upb_msglayout *const google_protobuf_FieldOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_FieldOptions_submsgs[1] = {
@ -317,7 +317,7 @@ static const upb_msglayout_field google_protobuf_FieldOptions__fields[7] = {
const upb_msglayout google_protobuf_FieldOptions_msginit = { const upb_msglayout google_protobuf_FieldOptions_msginit = {
&google_protobuf_FieldOptions_submsgs[0], &google_protobuf_FieldOptions_submsgs[0],
&google_protobuf_FieldOptions__fields[0], &google_protobuf_FieldOptions__fields[0],
UPB_SIZE(24, 24), 7, false, 255, UPB_SIZE(24, 24), 7, false, 3, 255,
}; };
static const upb_msglayout *const google_protobuf_OneofOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_OneofOptions_submsgs[1] = {
@ -331,7 +331,7 @@ static const upb_msglayout_field google_protobuf_OneofOptions__fields[1] = {
const upb_msglayout google_protobuf_OneofOptions_msginit = { const upb_msglayout google_protobuf_OneofOptions_msginit = {
&google_protobuf_OneofOptions_submsgs[0], &google_protobuf_OneofOptions_submsgs[0],
&google_protobuf_OneofOptions__fields[0], &google_protobuf_OneofOptions__fields[0],
UPB_SIZE(8, 8), 1, false, 255, UPB_SIZE(8, 8), 1, false, 0, 255,
}; };
static const upb_msglayout *const google_protobuf_EnumOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_EnumOptions_submsgs[1] = {
@ -347,7 +347,7 @@ static const upb_msglayout_field google_protobuf_EnumOptions__fields[3] = {
const upb_msglayout google_protobuf_EnumOptions_msginit = { const upb_msglayout google_protobuf_EnumOptions_msginit = {
&google_protobuf_EnumOptions_submsgs[0], &google_protobuf_EnumOptions_submsgs[0],
&google_protobuf_EnumOptions__fields[0], &google_protobuf_EnumOptions__fields[0],
UPB_SIZE(8, 16), 3, false, 255, UPB_SIZE(8, 16), 3, false, 0, 255,
}; };
static const upb_msglayout *const google_protobuf_EnumValueOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_EnumValueOptions_submsgs[1] = {
@ -362,7 +362,7 @@ static const upb_msglayout_field google_protobuf_EnumValueOptions__fields[2] = {
const upb_msglayout google_protobuf_EnumValueOptions_msginit = { const upb_msglayout google_protobuf_EnumValueOptions_msginit = {
&google_protobuf_EnumValueOptions_submsgs[0], &google_protobuf_EnumValueOptions_submsgs[0],
&google_protobuf_EnumValueOptions__fields[0], &google_protobuf_EnumValueOptions__fields[0],
UPB_SIZE(8, 16), 2, false, 255, UPB_SIZE(8, 16), 2, false, 1, 255,
}; };
static const upb_msglayout *const google_protobuf_ServiceOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_ServiceOptions_submsgs[1] = {
@ -377,7 +377,7 @@ static const upb_msglayout_field google_protobuf_ServiceOptions__fields[2] = {
const upb_msglayout google_protobuf_ServiceOptions_msginit = { const upb_msglayout google_protobuf_ServiceOptions_msginit = {
&google_protobuf_ServiceOptions_submsgs[0], &google_protobuf_ServiceOptions_submsgs[0],
&google_protobuf_ServiceOptions__fields[0], &google_protobuf_ServiceOptions__fields[0],
UPB_SIZE(8, 16), 2, false, 255, UPB_SIZE(8, 16), 2, false, 0, 255,
}; };
static const upb_msglayout *const google_protobuf_MethodOptions_submsgs[1] = { static const upb_msglayout *const google_protobuf_MethodOptions_submsgs[1] = {
@ -393,7 +393,7 @@ static const upb_msglayout_field google_protobuf_MethodOptions__fields[3] = {
const upb_msglayout google_protobuf_MethodOptions_msginit = { const upb_msglayout google_protobuf_MethodOptions_msginit = {
&google_protobuf_MethodOptions_submsgs[0], &google_protobuf_MethodOptions_submsgs[0],
&google_protobuf_MethodOptions__fields[0], &google_protobuf_MethodOptions__fields[0],
UPB_SIZE(16, 24), 3, false, 255, UPB_SIZE(16, 24), 3, false, 0, 255,
}; };
static const upb_msglayout *const google_protobuf_UninterpretedOption_submsgs[1] = { static const upb_msglayout *const google_protobuf_UninterpretedOption_submsgs[1] = {
@ -413,7 +413,7 @@ static const upb_msglayout_field google_protobuf_UninterpretedOption__fields[7]
const upb_msglayout google_protobuf_UninterpretedOption_msginit = { const upb_msglayout google_protobuf_UninterpretedOption_msginit = {
&google_protobuf_UninterpretedOption_submsgs[0], &google_protobuf_UninterpretedOption_submsgs[0],
&google_protobuf_UninterpretedOption__fields[0], &google_protobuf_UninterpretedOption__fields[0],
UPB_SIZE(64, 96), 7, false, 255, UPB_SIZE(64, 96), 7, false, 0, 255,
}; };
static const upb_msglayout_field google_protobuf_UninterpretedOption_NamePart__fields[2] = { static const upb_msglayout_field google_protobuf_UninterpretedOption_NamePart__fields[2] = {
@ -424,7 +424,7 @@ static const upb_msglayout_field google_protobuf_UninterpretedOption_NamePart__f
const upb_msglayout google_protobuf_UninterpretedOption_NamePart_msginit = { const upb_msglayout google_protobuf_UninterpretedOption_NamePart_msginit = {
NULL, NULL,
&google_protobuf_UninterpretedOption_NamePart__fields[0], &google_protobuf_UninterpretedOption_NamePart__fields[0],
UPB_SIZE(16, 32), 2, false, 255, UPB_SIZE(16, 32), 2, false, 2, 255,
}; };
static const upb_msglayout *const google_protobuf_SourceCodeInfo_submsgs[1] = { static const upb_msglayout *const google_protobuf_SourceCodeInfo_submsgs[1] = {
@ -438,7 +438,7 @@ static const upb_msglayout_field google_protobuf_SourceCodeInfo__fields[1] = {
const upb_msglayout google_protobuf_SourceCodeInfo_msginit = { const upb_msglayout google_protobuf_SourceCodeInfo_msginit = {
&google_protobuf_SourceCodeInfo_submsgs[0], &google_protobuf_SourceCodeInfo_submsgs[0],
&google_protobuf_SourceCodeInfo__fields[0], &google_protobuf_SourceCodeInfo__fields[0],
UPB_SIZE(8, 8), 1, false, 255, UPB_SIZE(8, 8), 1, false, 1, 255,
}; };
static const upb_msglayout_field google_protobuf_SourceCodeInfo_Location__fields[5] = { static const upb_msglayout_field google_protobuf_SourceCodeInfo_Location__fields[5] = {
@ -452,7 +452,7 @@ static const upb_msglayout_field google_protobuf_SourceCodeInfo_Location__fields
const upb_msglayout google_protobuf_SourceCodeInfo_Location_msginit = { const upb_msglayout google_protobuf_SourceCodeInfo_Location_msginit = {
NULL, NULL,
&google_protobuf_SourceCodeInfo_Location__fields[0], &google_protobuf_SourceCodeInfo_Location__fields[0],
UPB_SIZE(32, 64), 5, false, 255, UPB_SIZE(32, 64), 5, false, 4, 255,
}; };
static const upb_msglayout *const google_protobuf_GeneratedCodeInfo_submsgs[1] = { static const upb_msglayout *const google_protobuf_GeneratedCodeInfo_submsgs[1] = {
@ -466,7 +466,7 @@ static const upb_msglayout_field google_protobuf_GeneratedCodeInfo__fields[1] =
const upb_msglayout google_protobuf_GeneratedCodeInfo_msginit = { const upb_msglayout google_protobuf_GeneratedCodeInfo_msginit = {
&google_protobuf_GeneratedCodeInfo_submsgs[0], &google_protobuf_GeneratedCodeInfo_submsgs[0],
&google_protobuf_GeneratedCodeInfo__fields[0], &google_protobuf_GeneratedCodeInfo__fields[0],
UPB_SIZE(8, 8), 1, false, 255, UPB_SIZE(8, 8), 1, false, 1, 255,
}; };
static const upb_msglayout_field google_protobuf_GeneratedCodeInfo_Annotation__fields[4] = { static const upb_msglayout_field google_protobuf_GeneratedCodeInfo_Annotation__fields[4] = {
@ -479,7 +479,7 @@ static const upb_msglayout_field google_protobuf_GeneratedCodeInfo_Annotation__f
const upb_msglayout google_protobuf_GeneratedCodeInfo_Annotation_msginit = { const upb_msglayout google_protobuf_GeneratedCodeInfo_Annotation_msginit = {
NULL, NULL,
&google_protobuf_GeneratedCodeInfo_Annotation__fields[0], &google_protobuf_GeneratedCodeInfo_Annotation__fields[0],
UPB_SIZE(24, 48), 4, false, 255, UPB_SIZE(24, 48), 4, false, 4, 255,
}; };
#include "upb/port_undef.inc" #include "upb/port_undef.inc"

@ -299,24 +299,42 @@ static void decode_munge(int type, wireval *val) {
} }
static const upb_msglayout_field *upb_find_field(const upb_msglayout *l, static const upb_msglayout_field *upb_find_field(const upb_msglayout *l,
uint32_t field_number) { uint32_t field_number,
int *last_field_index) {
static upb_msglayout_field none = {0, 0, 0, 0, 0, 0}; static upb_msglayout_field none = {0, 0, 0, 0, 0, 0};
/* Lots of optimization opportunities here. */
int i;
if (l == NULL) return &none; if (l == NULL) return &none;
for (i = 0; i < l->field_count; i++) {
if (l->fields[i].number == field_number) { size_t idx = (size_t)field_number - 1; // 0 wraps to SIZE_MAX
return &l->fields[i]; if (idx < l->dense_below) {
goto found;
}
int last = *last_field_index;
for (idx = last; idx < l->field_count; idx++) {
if (l->fields[idx].number == field_number) {
goto found;
}
}
for (idx = 0; idx < last; idx++) {
if (l->fields[idx].number == field_number) {
goto found;
} }
} }
return &none; /* Unknown field. */ return &none; /* Unknown field. */
found:
UPB_ASSERT(l->fields[idx].number == field_number);
*last_field_index = idx;
return &l->fields[idx];
} }
static upb_msg *decode_newsubmsg(upb_decstate *d, const upb_msglayout *layout, static upb_msg *decode_newsubmsg(upb_decstate *d,
const upb_msglayout *const *submsgs,
const upb_msglayout_field *field) { const upb_msglayout_field *field) {
const upb_msglayout *subl = layout->submsgs[field->submsg_index]; const upb_msglayout *subl = submsgs[field->submsg_index];
return _upb_msg_new_inl(subl, &d->arena); return _upb_msg_new_inl(subl, &d->arena);
} }
@ -346,9 +364,10 @@ static const char *decode_readstr(upb_decstate *d, const char *ptr, int size,
UPB_FORCEINLINE UPB_FORCEINLINE
static const char *decode_tosubmsg(upb_decstate *d, const char *ptr, static const char *decode_tosubmsg(upb_decstate *d, const char *ptr,
upb_msg *submsg, const upb_msglayout *layout, upb_msg *submsg,
const upb_msglayout *const *submsgs,
const upb_msglayout_field *field, int size) { const upb_msglayout_field *field, int size) {
const upb_msglayout *subl = layout->submsgs[field->submsg_index]; const upb_msglayout *subl = submsgs[field->submsg_index];
int saved_delta = decode_pushlimit(d, ptr, size); int saved_delta = decode_pushlimit(d, ptr, size);
if (--d->depth < 0) decode_err(d); if (--d->depth < 0) decode_err(d);
if (!decode_isdone(d, &ptr)) { if (!decode_isdone(d, &ptr)) {
@ -377,15 +396,17 @@ static const char *decode_group(upb_decstate *d, const char *ptr,
UPB_FORCEINLINE UPB_FORCEINLINE
static const char *decode_togroup(upb_decstate *d, const char *ptr, static const char *decode_togroup(upb_decstate *d, const char *ptr,
upb_msg *submsg, const upb_msglayout *layout, upb_msg *submsg,
const upb_msglayout *const *submsgs,
const upb_msglayout_field *field) { const upb_msglayout_field *field) {
const upb_msglayout *subl = layout->submsgs[field->submsg_index]; const upb_msglayout *subl = submsgs[field->submsg_index];
return decode_group(d, ptr, submsg, subl, field->number); return decode_group(d, ptr, submsg, subl, field->number);
} }
static const char *decode_toarray(upb_decstate *d, const char *ptr, static const char *decode_toarray(upb_decstate *d, const char *ptr,
upb_msg *msg, const upb_msglayout *layout, upb_msg *msg,
const upb_msglayout_field *field, wireval val, const upb_msglayout *const *submsgs,
const upb_msglayout_field *field, wireval *val,
int op) { int op) {
upb_array **arrp = UPB_PTR_AT(msg, field->offset, void); upb_array **arrp = UPB_PTR_AT(msg, field->offset, void);
upb_array *arr = *arrp; upb_array *arr = *arrp;
@ -407,27 +428,27 @@ static const char *decode_toarray(upb_decstate *d, const char *ptr,
/* Append scalar value. */ /* Append scalar value. */
mem = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << op, void); mem = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << op, void);
arr->len++; arr->len++;
memcpy(mem, &val, 1 << op); memcpy(mem, val, 1 << op);
return ptr; return ptr;
case OP_STRING: case OP_STRING:
decode_verifyutf8(d, ptr, val.size); decode_verifyutf8(d, ptr, val->size);
/* Fallthrough. */ /* Fallthrough. */
case OP_BYTES: { case OP_BYTES: {
/* Append bytes. */ /* Append bytes. */
upb_strview *str = (upb_strview*)_upb_array_ptr(arr) + arr->len; upb_strview *str = (upb_strview*)_upb_array_ptr(arr) + arr->len;
arr->len++; arr->len++;
return decode_readstr(d, ptr, val.size, str); return decode_readstr(d, ptr, val->size, str);
} }
case OP_SUBMSG: { case OP_SUBMSG: {
/* Append submessage / group. */ /* Append submessage / group. */
upb_msg *submsg = decode_newsubmsg(d, layout, field); upb_msg *submsg = decode_newsubmsg(d, submsgs, field);
*UPB_PTR_AT(_upb_array_ptr(arr), arr->len * sizeof(void *), upb_msg *) = *UPB_PTR_AT(_upb_array_ptr(arr), arr->len * sizeof(void *), upb_msg *) =
submsg; submsg;
arr->len++; arr->len++;
if (UPB_UNLIKELY(field->descriptortype == UPB_DTYPE_GROUP)) { if (UPB_UNLIKELY(field->descriptortype == UPB_DTYPE_GROUP)) {
return decode_togroup(d, ptr, submsg, layout, field); return decode_togroup(d, ptr, submsg, submsgs, field);
} else { } else {
return decode_tosubmsg(d, ptr, submsg, layout, field, val.size); return decode_tosubmsg(d, ptr, submsg, submsgs, field, val->size);
} }
} }
case OP_FIXPCK_LG2(2): case OP_FIXPCK_LG2(2):
@ -435,15 +456,15 @@ static const char *decode_toarray(upb_decstate *d, const char *ptr,
/* Fixed packed. */ /* Fixed packed. */
int lg2 = op - OP_FIXPCK_LG2(0); int lg2 = op - OP_FIXPCK_LG2(0);
int mask = (1 << lg2) - 1; int mask = (1 << lg2) - 1;
size_t count = val.size >> lg2; size_t count = val->size >> lg2;
if ((val.size & mask) != 0) { if ((val->size & mask) != 0) {
decode_err(d); /* Length isn't a round multiple of elem size. */ decode_err(d); /* Length isn't a round multiple of elem size. */
} }
decode_reserve(d, arr, count); decode_reserve(d, arr, count);
mem = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << lg2, void); mem = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << lg2, void);
arr->len += count; arr->len += count;
memcpy(mem, ptr, val.size); /* XXX: ptr boundary. */ memcpy(mem, ptr, val->size); /* XXX: ptr boundary. */
return ptr + val.size; return ptr + val->size;
} }
case OP_VARPCK_LG2(0): case OP_VARPCK_LG2(0):
case OP_VARPCK_LG2(2): case OP_VARPCK_LG2(2):
@ -451,7 +472,7 @@ static const char *decode_toarray(upb_decstate *d, const char *ptr,
/* Varint packed. */ /* Varint packed. */
int lg2 = op - OP_VARPCK_LG2(0); int lg2 = op - OP_VARPCK_LG2(0);
int scale = 1 << lg2; int scale = 1 << lg2;
int saved_limit = decode_pushlimit(d, ptr, val.size); int saved_limit = decode_pushlimit(d, ptr, val->size);
char *out = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << lg2, void); char *out = UPB_PTR_AT(_upb_array_ptr(arr), arr->len << lg2, void);
while (!decode_isdone(d, &ptr)) { while (!decode_isdone(d, &ptr)) {
wireval elem; wireval elem;
@ -473,16 +494,15 @@ static const char *decode_toarray(upb_decstate *d, const char *ptr,
} }
static const char *decode_tomap(upb_decstate *d, const char *ptr, upb_msg *msg, static const char *decode_tomap(upb_decstate *d, const char *ptr, upb_msg *msg,
const upb_msglayout *layout, const upb_msglayout *const *submsgs,
const upb_msglayout_field *field, wireval val) { const upb_msglayout_field *field, wireval *val) {
upb_map **map_p = UPB_PTR_AT(msg, field->offset, upb_map *); upb_map **map_p = UPB_PTR_AT(msg, field->offset, upb_map *);
upb_map *map = *map_p; upb_map *map = *map_p;
upb_map_entry ent; upb_map_entry ent;
const upb_msglayout *entry = layout->submsgs[field->submsg_index]; const upb_msglayout *entry = submsgs[field->submsg_index];
if (!map) { if (!map) {
/* Lazily create map. */ /* Lazily create map. */
const upb_msglayout *entry = layout->submsgs[field->submsg_index];
const upb_msglayout_field *key_field = &entry->fields[0]; const upb_msglayout_field *key_field = &entry->fields[0];
const upb_msglayout_field *val_field = &entry->fields[1]; const upb_msglayout_field *val_field = &entry->fields[1];
char key_size = desctype_to_mapsize[key_field->descriptortype]; char key_size = desctype_to_mapsize[key_field->descriptortype];
@ -502,28 +522,28 @@ static const char *decode_tomap(upb_decstate *d, const char *ptr, upb_msg *msg,
ent.v.val = upb_value_ptr(_upb_msg_new(entry->submsgs[0], &d->arena)); ent.v.val = upb_value_ptr(_upb_msg_new(entry->submsgs[0], &d->arena));
} }
ptr = decode_tosubmsg(d, ptr, &ent.k, layout, field, val.size); ptr = decode_tosubmsg(d, ptr, &ent.k, submsgs, field, val->size);
_upb_map_set(map, &ent.k, map->key_size, &ent.v, map->val_size, &d->arena); _upb_map_set(map, &ent.k, map->key_size, &ent.v, map->val_size, &d->arena);
return ptr; return ptr;
} }
static const char *decode_tomsg(upb_decstate *d, const char *ptr, upb_msg *msg, static const char *decode_tomsg(upb_decstate *d, const char *ptr, upb_msg *msg,
const upb_msglayout *layout, const upb_msglayout *const *submsgs,
const upb_msglayout_field *field, wireval val, const upb_msglayout_field *field, wireval *val,
int op) { int op) {
void *mem = UPB_PTR_AT(msg, field->offset, void); void *mem = UPB_PTR_AT(msg, field->offset, void);
int type = field->descriptortype; int type = field->descriptortype;
/* Set presence if necessary. */ /* Set presence if necessary. */
if (field->presence < 0) { if (field->presence > 0) {
_upb_sethas_field(msg, field);
} else if (field->presence < 0) {
/* Oneof case */ /* Oneof case */
uint32_t *oneof_case = _upb_oneofcase_field(msg, field); uint32_t *oneof_case = _upb_oneofcase_field(msg, field);
if (op == OP_SUBMSG && *oneof_case != field->number) { if (op == OP_SUBMSG && *oneof_case != field->number) {
memset(mem, 0, sizeof(void*)); memset(mem, 0, sizeof(void*));
} }
*oneof_case = field->number; *oneof_case = field->number;
} else if (field->presence > 0) {
_upb_sethas_field(msg, field);
} }
/* Store into message. */ /* Store into message. */
@ -532,29 +552,29 @@ static const char *decode_tomsg(upb_decstate *d, const char *ptr, upb_msg *msg,
upb_msg **submsgp = mem; upb_msg **submsgp = mem;
upb_msg *submsg = *submsgp; upb_msg *submsg = *submsgp;
if (!submsg) { if (!submsg) {
submsg = decode_newsubmsg(d, layout, field); submsg = decode_newsubmsg(d, submsgs, field);
*submsgp = submsg; *submsgp = submsg;
} }
if (UPB_UNLIKELY(type == UPB_DTYPE_GROUP)) { if (UPB_UNLIKELY(type == UPB_DTYPE_GROUP)) {
ptr = decode_togroup(d, ptr, submsg, layout, field); ptr = decode_togroup(d, ptr, submsg, submsgs, field);
} else { } else {
ptr = decode_tosubmsg(d, ptr, submsg, layout, field, val.size); ptr = decode_tosubmsg(d, ptr, submsg, submsgs, field, val->size);
} }
break; break;
} }
case OP_STRING: case OP_STRING:
decode_verifyutf8(d, ptr, val.size); decode_verifyutf8(d, ptr, val->size);
/* Fallthrough. */ /* Fallthrough. */
case OP_BYTES: case OP_BYTES:
return decode_readstr(d, ptr, val.size, mem); return decode_readstr(d, ptr, val->size, mem);
case OP_SCALAR_LG2(3): case OP_SCALAR_LG2(3):
memcpy(mem, &val, 8); memcpy(mem, val, 8);
break; break;
case OP_SCALAR_LG2(2): case OP_SCALAR_LG2(2):
memcpy(mem, &val, 4); memcpy(mem, val, 4);
break; break;
case OP_SCALAR_LG2(0): case OP_SCALAR_LG2(0):
memcpy(mem, &val, 1); memcpy(mem, val, 1);
break; break;
default: default:
UPB_UNREACHABLE(); UPB_UNREACHABLE();
@ -580,6 +600,7 @@ static bool decode_tryfastdispatch(upb_decstate *d, const char **ptr,
UPB_NOINLINE UPB_NOINLINE
static const char *decode_msg(upb_decstate *d, const char *ptr, upb_msg *msg, static const char *decode_msg(upb_decstate *d, const char *ptr, upb_msg *msg,
const upb_msglayout *layout) { const upb_msglayout *layout) {
int last_field_index = 0;
while (true) { while (true) {
uint32_t tag; uint32_t tag;
const upb_msglayout_field *field; const upb_msglayout_field *field;
@ -594,7 +615,7 @@ static const char *decode_msg(upb_decstate *d, const char *ptr, upb_msg *msg,
field_number = tag >> 3; field_number = tag >> 3;
wire_type = tag & 7; wire_type = tag & 7;
field = upb_find_field(layout, field_number); field = upb_find_field(layout, field_number, &last_field_index);
switch (wire_type) { switch (wire_type) {
case UPB_WIRE_TYPE_VARINT: case UPB_WIRE_TYPE_VARINT:
@ -646,13 +667,13 @@ static const char *decode_msg(upb_decstate *d, const char *ptr, upb_msg *msg,
switch (field->label) { switch (field->label) {
case UPB_LABEL_REPEATED: case UPB_LABEL_REPEATED:
case _UPB_LABEL_PACKED: case _UPB_LABEL_PACKED:
ptr = decode_toarray(d, ptr, msg, layout, field, val, op); ptr = decode_toarray(d, ptr, msg, layout->submsgs, field, &val, op);
break; break;
case _UPB_LABEL_MAP: case _UPB_LABEL_MAP:
ptr = decode_tomap(d, ptr, msg, layout, field, val); ptr = decode_tomap(d, ptr, msg, layout->submsgs, field, &val);
break; break;
default: default:
ptr = decode_tomsg(d, ptr, msg, layout, field, val, op); ptr = decode_tomsg(d, ptr, msg, layout->submsgs, field, &val, op);
break; break;
} }
} else { } else {

@ -1018,14 +1018,20 @@ static int field_number_cmp(const void *p1, const void *p2) {
return f1->number - f2->number; return f1->number - f2->number;
} }
static void assign_layout_indices(const upb_msgdef *m, upb_msglayout_field *fields) { static void assign_layout_indices(const upb_msgdef *m, upb_msglayout *l,
upb_msglayout_field *fields) {
int i; int i;
int n = upb_msgdef_numfields(m); int n = upb_msgdef_numfields(m);
int dense_below = 0;
for (i = 0; i < n; i++) { for (i = 0; i < n; i++) {
upb_fielddef *f = (upb_fielddef*)upb_msgdef_itof(m, fields[i].number); upb_fielddef *f = (upb_fielddef*)upb_msgdef_itof(m, fields[i].number);
UPB_ASSERT(f); UPB_ASSERT(f);
f->layout_index = i; f->layout_index = i;
if (upb_fielddef_number(f) == i + 1) {
dense_below = upb_fielddef_number(f);
}
} }
l->dense_below = dense_below;
} }
/* This function is the dynamic equivalent of message_layout.{cc,h} in upbc. /* This function is the dynamic equivalent of message_layout.{cc,h} in upbc.
@ -1197,7 +1203,7 @@ static void make_layout(symtab_addctx *ctx, const upb_msgdef *m) {
/* Sort fields by number. */ /* Sort fields by number. */
qsort(fields, upb_msgdef_numfields(m), sizeof(*fields), field_number_cmp); qsort(fields, upb_msgdef_numfields(m), sizeof(*fields), field_number_cmp);
assign_layout_indices(m, fields); assign_layout_indices(m, l, fields);
} }
static char *strviewdup(symtab_addctx *ctx, upb_strview view) { static char *strviewdup(symtab_addctx *ctx, upb_strview view) {

@ -65,6 +65,7 @@ struct upb_msglayout {
uint16_t size; uint16_t size;
uint16_t field_count; uint16_t field_count;
bool extendable; bool extendable;
uint8_t dense_below;
uint8_t table_mask; uint8_t table_mask;
/* To constant-initialize the tables of variable length, we need a flexible /* To constant-initialize the tables of variable length, we need a flexible
* array member, and we need to compile in C99 mode. */ * array member, and we need to compile in C99 mode. */
@ -131,7 +132,11 @@ UPB_INLINE bool _upb_hasbit(const upb_msg *msg, size_t idx) {
} }
UPB_INLINE void _upb_sethas(const upb_msg *msg, size_t idx) { UPB_INLINE void _upb_sethas(const upb_msg *msg, size_t idx) {
(*UPB_PTR_AT(msg, idx / 8, char)) |= (char)(1 << (idx % 8)); if (UPB_LIKELY(idx < 32)) {
(*UPB_PTR_AT(msg, 0, uint32_t)) |= (1UL << idx);
} else {
(*UPB_PTR_AT(msg, idx / 8, char)) |= (char)(1 << (idx % 8));
}
} }
UPB_INLINE void _upb_clearhas(const upb_msg *msg, size_t idx) { UPB_INLINE void _upb_clearhas(const upb_msg *msg, size_t idx) {

@ -867,6 +867,7 @@ void WriteSource(const protobuf::FileDescriptor* file, Output& output,
std::string msgname = ToCIdent(message->full_name()); std::string msgname = ToCIdent(message->full_name());
std::string fields_array_ref = "NULL"; std::string fields_array_ref = "NULL";
std::string submsgs_array_ref = "NULL"; std::string submsgs_array_ref = "NULL";
uint8_t dense_below = 0;
MessageLayout layout(message); MessageLayout layout(message);
SubmsgArray submsg_array(message); SubmsgArray submsg_array(message);
@ -896,6 +897,12 @@ void WriteSource(const protobuf::FileDescriptor* file, Output& output,
int submsg_index = 0; int submsg_index = 0;
std::string presence = "0"; std::string presence = "0";
if (field->number() <=
std::numeric_limits<decltype(dense_below)>::max() &&
dense_below + 1 == field->number()) {
dense_below = field->number();
}
if (field->cpp_type() == protobuf::FieldDescriptor::CPPTYPE_MESSAGE) { if (field->cpp_type() == protobuf::FieldDescriptor::CPPTYPE_MESSAGE) {
submsg_index = submsg_array.GetIndex(field); submsg_index = submsg_array.GetIndex(field);
} }
@ -951,9 +958,10 @@ void WriteSource(const protobuf::FileDescriptor* file, Output& output,
output("const upb_msglayout $0 = {\n", MessageInit(message)); output("const upb_msglayout $0 = {\n", MessageInit(message));
output(" $0,\n", submsgs_array_ref); output(" $0,\n", submsgs_array_ref);
output(" $0,\n", fields_array_ref); output(" $0,\n", fields_array_ref);
output(" $0, $1, $2, $3,\n", GetSizeInit(layout.message_size()), output(" $0, $1, $2, $3, $4,\n", GetSizeInit(layout.message_size()),
field_number_order.size(), field_number_order.size(),
"false", // TODO: extendable "false", // TODO: extendable
dense_below,
table_mask table_mask
); );
if (!table.empty()) { if (!table.empty()) {

Loading…
Cancel
Save