Joshua Haberman
e70e488e09
MSVC apparently doesn't support the standard C/C++ defines.
...
Our requirement then is C99, C++11, or MSVC >= 2015.
4 years ago
Joshua Haberman
8d670d8aea
Renamed decode_varint32() to decode_tag().
4 years ago
Joshua Haberman
9abf8e043f
Clamp 32-bit varints to 5 bytes to fix a fuzz failure.
4 years ago
Joshua Haberman
358fa14d0e
Fixed headers and updated benchmark script.
4 years ago
Joshua Haberman
bc200451ce
Use a macro instead of an inline function for setjmp/longjmp.
4 years ago
Joshua Haberman
fbc0639b07
Use _setjmp on mac to avoid saving/restoring the signal mask.
4 years ago
Joshua Haberman
65d166a6ba
Added API for copy vs. alias and added benchmarks to test both.
...
Benchmark output:
$ bazel-bin/benchmarks/benchmark '--benchmark_filter=BM_Parse'
2020-11-11 15:39:04
Running bazel-bin/benchmarks/benchmark
Run on (72 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32K (x36)
L1 Instruction 32K (x36)
L2 Unified 1024K (x36)
L3 Unified 25344K (x2)
-------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc<UseArena, Copy> 4134 ns 4134 ns 168714 1.69152GB/s
BM_Parse_Upb_FileDesc<UseArena, Alias> 3487 ns 3487 ns 199509 2.00526GB/s
BM_Parse_Upb_FileDesc<InitBlock, Copy> 3727 ns 3726 ns 187581 1.87643GB/s
BM_Parse_Upb_FileDesc<InitBlock, Alias> 3110 ns 3110 ns 224970 2.24866GB/s
BM_Parse_Proto2<FileDesc, NoArena, Copy> 31132 ns 31132 ns 22437 229.995MB/s
BM_Parse_Proto2<FileDesc, UseArena, Copy> 21011 ns 21009 ns 33922 340.812MB/s
BM_Parse_Proto2<FileDesc, InitBlock, Copy> 17976 ns 17975 ns 38808 398.337MB/s
BM_Parse_Proto2<FileDescSV, InitBlock, Alias> 17357 ns 17356 ns 40244 412.539MB/s
4 years ago
Joshua Haberman
9df96874e9
Start arena block doubling at initial block size.
...
If an initial block is provided, we should start our
block doubling at the size of the initial block, not 128.
This saves us from unnecessary overhead when we overflow
the initial block.
4 years ago
Joshua Haberman
e5bdfba92c
Removed accidentally-added .orig file.
4 years ago
Joshua Haberman
982b634bc5
Fixed a few minor bugs found by fuzzing.
4 years ago
Joshua Haberman
a01f3e23a4
Fixes for google3 build, and exclude even more tests from macOS to avoid timeout.
4 years ago
Joshua Haberman
1eb7bd39e7
Some formatting fixes.
4 years ago
Joshua Haberman
6c16cba83f
Removed obsolete port.c file.
4 years ago
Joshua Haberman
5b1f0d86a1
For Kokoro, only build/test -m32 on Linux.
...
Also fixed a bunch of bugs found by gcc's -fanalyzer.
4 years ago
Joshua Haberman
0497f8deed
Fixed a critical bug on 32-bit builds, and added much more Kokoro testing.
...
There was a bug in our arena code where we assumed that
sizeof(upb_array) would be a multiple of 8. On i386 it was
not, and this was causing memory corruption on 32-bit builds.
4 years ago
Joshua Haberman
64abb5eb11
Amalgamation no longer bundles wyhash, but #includes it.
...
Also fixed a few spelling mistakes.
4 years ago
Joshua Haberman
dd0994d377
Bugfix for JSON decoding: only check real oneofs for duplicates.
...
Also fixed upb_msg_whichoneof() to work properly for synthetic
fields, and to be simpler in general.
4 years ago
Joshua Haberman
c9f9668234
symtab: use longjmp() for errors and avoid intermediate table.
...
We used to use a separate "add table" during the upb_symtab_addfile()
operation to make it easier to back out the file if it contained
errors. But this created unnecessary work of re-adding the same symbols
to the main symtab once everything was validated.
Instead we directly add symbols to the main symbols table. If there is
an error in validation, we remove precisely the set of symbols that
were already added.
This also requires using a separate arena for each file. We can fuse
it with the symtab's main arena if the operation is successful.
LoadDescriptor_Upb 61.2µs ± 4% 53.5µs ± 1% -12.50% (p=0.000 n=12+12)
LoadAdsDescriptor_Upb 4.43ms ± 1% 3.06ms ± 0% -31.00% (p=0.000 n=12+12)
LoadDescriptor_Proto2 257µs ± 0% 259µs ± 0% +1.00% (p=0.000 n=12+12)
LoadAdsDescriptor_Proto2 13.9ms ± 1% 13.9ms ± 1% ~ (p=0.128 n=12+12)
4 years ago
Joshua Haberman
c3b5637646
Added benchmark for loading ads descriptor.
...
Generally this seems to track the speed of loading descriptor.proto.
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------------
BM_LoadDescriptor_Upb 59091 ns 59086 ns 11747 121.182MB/s
BM_LoadAdsDescriptor_Upb 4218587 ns 4218582 ns 166 120.544MB/s
BM_LoadDescriptor_Proto2 241083 ns 241049 ns 2903 29.7043MB/s
BM_LoadAdsDescriptor_Proto2 13442631 ns 13442099 ns 52 34.8975MB/s
4 years ago
Joshua Haberman
acd72c6d3f
WIP.
4 years ago
Joshua Haberman
8113ebd6c7
Added explanatory comment about integer constants.
4 years ago
Joshua Haberman
f2d826b9f3
Got rid of floating-point division in table init.
...
Ideally we would get rid of all floating-point operations
in table.c, but that's a job for another day.
4 years ago
Joshua Haberman
154f2c25f4
Added UTF-8 validation for proto3 string fields.
4 years ago
Joshua Haberman
e8f9eac68c
Added #defines UPB_ENABLE_FASTTABLE and UPB_TRY_ENABLE_FASTTABLE.
...
These control whether fasttable decoding is on.
4 years ago
Joshua Haberman
e86541ac1d
Fixed the build after the merge.
4 years ago
Joshua Haberman
8f3ee80d46
Drop C89/C90 support and MSVC prior to Visual Studio 2015.
...
upb previously attempted to support C89 and pre-2015 versions
of Visual Studio. This was to support older compilers with
limited C99 support (particularly MSVC). But as of last August,
even gRPC has dropped support for MSVC prior to 2015
c87276d058
Therefore it seems safe for upb to no longer attempt C89 support
(we were already not truly C89 compliant, with our use of "bool").
We now explicitly require C99 or greater and MSVC 2015 or greater.
This cleaned up port_def.inc a fair bit. I took the chance to
also remove some obsolete macros.
4 years ago
Joshua Haberman
2c8bb6dd9d
Specify C99 explicitly until/unless we stop using bool.
4 years ago
Joshua Haberman
efd576b698
Added -std=gnu99 for fastdecode and ran Buildifier.
4 years ago
Joshua Haberman
b928696942
A few more fixes, and test fastdecode under Kokoro.
4 years ago
Joshua Haberman
55f3569cd2
A few minor fixes and more assertions.
4 years ago
Joshua Haberman
46eb82467a
Added comment to decode_fast.h.
4 years ago
Joshua Haberman
bd9f8f580d
Fixed a few bugs with the fast decoder.
...
1. For long tags we were putting table entries in the wrong slot.
2. For repeated strings, when the buffer flipped to no longer alias we
were failing to notice and kept aliasing anyway.
4 years ago
Joshua Haberman
021db6fcd5
Allow larger tags into the table if they are unique mod 31.
...
Also fixed a bug with fixed packed in decode_fast.c.
4 years ago
Joshua Haberman
86d9908c55
Fastdecode support for packed fields.
...
This is not very optimized yet. There is a lot of room to
optimize it further.
4 years ago
Joshua Haberman
e3e797b680
Added fasttable support for oneofs.
4 years ago
Joshua Haberman
7ffa9c181a
Fixed some small bugs and performance problems in string copying.
...
Before this CL, with alias=false:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3715 ns 3715 ns 188916 1.88206GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,122.92 msec task-clock # 0.979 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
196 page-faults # 0.175 K/sec
4,144,746,717 cycles # 3.691 GHz
15,351,966,804 instructions # 3.70 insn per cycle
2,590,281,905 branches # 2306.728 M/sec
2,996,157 branch-misses # 0.12% of all branches
1.146615328 seconds time elapsed
1.115578000 seconds user
0.008025000 seconds sys
After this CL:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3554 ns 3554 ns 197527 1.9674GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,105.34 msec task-clock # 0.982 CPUs utilized
3 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
197 page-faults # 0.178 K/sec
4,077,736,892 cycles # 3.689 GHz
15,442,709,352 instructions # 3.79 insn per cycle
2,435,131,301 branches # 2203.068 M/sec
2,643,775 branch-misses # 0.11% of all branches
1.125393845 seconds time elapsed
1.097770000 seconds user
0.008012000 seconds sys
4 years ago
Joshua Haberman
e2c709e047
Repeated string and primitive support.
...
Much of the code was adapted from Gerben's code in:
6333031195
4 years ago
Joshua Haberman
25db40bc30
Fixed upb::InlinedArena, which was compeltely broken.
4 years ago
Joshua Haberman
d81ba58215
Optimized short string copying.
...
This sped up the alias=false case:
Before:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,216.65 msec task-clock # 0.936 CPUs utilized
6 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
200 page-faults # 0.164 K/sec
4,490,925,650 cycles # 3.691 GHz
16,516,403,731 instructions # 3.68 insn per cycle
2,828,536,650 branches # 2324.861 M/sec
5,425,830 branch-misses # 0.19% of all branches
1.300178903 seconds time elapsed
1.211475000 seconds user
0.072207000 seconds sys
After:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,109.69 msec task-clock # 0.930 CPUs utilized
5 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
198 page-faults # 0.178 K/sec
4,094,010,257 cycles # 3.689 GHz
15,672,677,812 instructions # 3.83 insn per cycle
2,589,291,160 branches # 2333.346 M/sec
3,306,386 branch-misses # 0.13% of all branches
1.193221789 seconds time elapsed
1.102538000 seconds user
0.072166000 seconds sys
4 years ago
Joshua Haberman
f3a2a79349
More optimization, back up to 2.56GB/s.
4 years ago
Joshua Haberman
199c914295
Simplify push/pop when msg fits in the current buffer.
4 years ago
Joshua Haberman
d5f5db2729
Put string-copying field parser into a separate function.
...
This helps to regain a bit of lost perf. Now at 2.3GB/s.
4 years ago
Joshua Haberman
f4adbe0698
Optimized varint decoding from Gerben.
...
This speeds things up but costs some code size.
name old time/op new time/op delta
ArenaOneAlloc 21.1ns ± 0% 21.3ns ± 0% +1.33% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% ~ (p=0.579 n=10+10)
LoadDescriptor_Upb 111µs ± 1% 110µs ± 1% -0.91% (p=0.003 n=11+12)
LoadDescriptor_Proto2 258µs ± 1% 258µs ± 1% ~ (p=0.674 n=10+12)
Parse_Upb_FileDesc_WithArena 11.2µs ± 0% 10.4µs ± 0% -6.67% (p=0.000 n=12+12)
Parse_Upb_FileDesc_WithInitialBlock 10.6µs ± 0% 10.1µs ± 0% -4.48% (p=0.000 n=12+11)
SerializeDescriptor_Proto2 5.36µs ± 5% 5.36µs ± 3% ~ (p=0.880 n=12+11)
SerializeDescriptor_Upb 11.9µs ± 0% 12.0µs ± 0% +0.81% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+23% +1.11Ki +24% +1.06Ki upb/decode.c
+15% +560 +15% +560 decode_msg
+140% +240 +188% +240 decode_longvarint64
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-49.7% -1.06Ki [ = ] 0 [Unmapped]
+0.0% +48 +0.9% +1.06Ki TOTAL
4 years ago
Joshua Haberman
48689df72e
Eliminated bounds checks inside parsing a field.
...
Each field parser gets 16 bytes of slop. This requires using a patch
buffer at end-of-buffer.
This addes 80% of what is needed to support a pull parser with a data
callback, since the main parser is now tolerant to buffer flips.
There is a ~4% performance regression and 12% code size regression in
upb/decode.c:
name old time/op new time/op delta
ArenaOneAlloc 21.0ns ± 0% 21.6ns ± 0% +2.87% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% +0.09% (p=0.001 n=11+12)
LoadDescriptor_Upb 114µs ± 1% 115µs ± 1% +0.96% (p=0.000 n=11+12)
LoadDescriptor_Proto2 260µs ± 1% 261µs ± 1% +0.55% (p=0.033 n=12+12)
Parse_Upb_FileDesc_WithArena 10.8µs ± 0% 11.2µs ± 0% +3.43% (p=0.000 n=11+11)
Parse_Upb_FileDesc_WithInitialBlock 10.5µs ± 0% 10.9µs ± 0% +3.68% (p=0.000 n=12+12)
SerializeDescriptor_Proto2 5.25µs ± 3% 5.42µs ± 5% +3.39% (p=0.007 n=12+12)
SerializeDescriptor_Upb 12.0µs ± 0% 12.5µs ± 0% +4.14% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+12% +606 +12% +560 upb/decode.c
+7.9% +288 +7.9% +288 decode_msg
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-9.3% -16 -12.5% -16 decode_longvarint64
-25.5% -558 [ = ] 0 [Unmapped]
+0.0% +48 +0.4% +560 TOTAL
4 years ago
Joshua Haberman
4ea81ab107
Fixed pedantic warning.
4 years ago
Joshua Haberman
6399b31f4b
Removed ULL constants in json_decode.c.
4 years ago
Joshua Haberman
c8ae197e64
Removed "U" suffixes, they are not necessary.
4 years ago
Joshua Haberman
bc1e0b314f
Fixed some strict C89 errors.
4 years ago
Joshua Haberman
2c1664906a
Removed license comments and upb_amalgamation for google3.
4 years ago
Joshua Haberman
b7dc77415a
Added licenses() to all BUILD files.
4 years ago