Joshua Haberman
d81ba58215
Optimized short string copying.
...
This sped up the alias=false case:
Before:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,216.65 msec task-clock # 0.936 CPUs utilized
6 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
200 page-faults # 0.164 K/sec
4,490,925,650 cycles # 3.691 GHz
16,516,403,731 instructions # 3.68 insn per cycle
2,828,536,650 branches # 2324.861 M/sec
5,425,830 branch-misses # 0.19% of all branches
1.300178903 seconds time elapsed
1.211475000 seconds user
0.072207000 seconds sys
After:
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s
Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock':
1,109.69 msec task-clock # 0.930 CPUs utilized
5 context-switches # 0.005 K/sec
0 cpu-migrations # 0.000 K/sec
198 page-faults # 0.178 K/sec
4,094,010,257 cycles # 3.689 GHz
15,672,677,812 instructions # 3.83 insn per cycle
2,589,291,160 branches # 2333.346 M/sec
3,306,386 branch-misses # 0.13% of all branches
1.193221789 seconds time elapsed
1.102538000 seconds user
0.072166000 seconds sys
4 years ago
Joshua Haberman
f3a2a79349
More optimization, back up to 2.56GB/s.
4 years ago
Joshua Haberman
199c914295
Simplify push/pop when msg fits in the current buffer.
4 years ago
Joshua Haberman
d5f5db2729
Put string-copying field parser into a separate function.
...
This helps to regain a bit of lost perf. Now at 2.3GB/s.
4 years ago
Joshua Haberman
f4adbe0698
Optimized varint decoding from Gerben.
...
This speeds things up but costs some code size.
name old time/op new time/op delta
ArenaOneAlloc 21.1ns ± 0% 21.3ns ± 0% +1.33% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% ~ (p=0.579 n=10+10)
LoadDescriptor_Upb 111µs ± 1% 110µs ± 1% -0.91% (p=0.003 n=11+12)
LoadDescriptor_Proto2 258µs ± 1% 258µs ± 1% ~ (p=0.674 n=10+12)
Parse_Upb_FileDesc_WithArena 11.2µs ± 0% 10.4µs ± 0% -6.67% (p=0.000 n=12+12)
Parse_Upb_FileDesc_WithInitialBlock 10.6µs ± 0% 10.1µs ± 0% -4.48% (p=0.000 n=12+11)
SerializeDescriptor_Proto2 5.36µs ± 5% 5.36µs ± 3% ~ (p=0.880 n=12+11)
SerializeDescriptor_Upb 11.9µs ± 0% 12.0µs ± 0% +0.81% (p=0.000 n=12+12)
FILE SIZE VM SIZE
-------------- --------------
+23% +1.11Ki +24% +1.06Ki upb/decode.c
+15% +560 +15% +560 decode_msg
+140% +240 +188% +240 decode_longvarint64
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-49.7% -1.06Ki [ = ] 0 [Unmapped]
+0.0% +48 +0.9% +1.06Ki TOTAL
4 years ago
Joshua Haberman
48689df72e
Eliminated bounds checks inside parsing a field.
...
Each field parser gets 16 bytes of slop. This requires using a patch
buffer at end-of-buffer.
This addes 80% of what is needed to support a pull parser with a data
callback, since the main parser is now tolerant to buffer flips.
There is a ~4% performance regression and 12% code size regression in
upb/decode.c:
name old time/op new time/op delta
ArenaOneAlloc 21.0ns ± 0% 21.6ns ± 0% +2.87% (p=0.000 n=12+12)
ArenaInitialBlockOneAlloc 6.02ns ± 0% 6.02ns ± 0% +0.09% (p=0.001 n=11+12)
LoadDescriptor_Upb 114µs ± 1% 115µs ± 1% +0.96% (p=0.000 n=11+12)
LoadDescriptor_Proto2 260µs ± 1% 261µs ± 1% +0.55% (p=0.033 n=12+12)
Parse_Upb_FileDesc_WithArena 10.8µs ± 0% 11.2µs ± 0% +3.43% (p=0.000 n=11+11)
Parse_Upb_FileDesc_WithInitialBlock 10.5µs ± 0% 10.9µs ± 0% +3.68% (p=0.000 n=12+12)
SerializeDescriptor_Proto2 5.25µs ± 3% 5.42µs ± 5% +3.39% (p=0.007 n=12+12)
SerializeDescriptor_Upb 12.0µs ± 0% 12.5µs ± 0% +4.14% (p=0.000 n=12+11)
FILE SIZE VM SIZE
-------------- --------------
+12% +606 +12% +560 upb/decode.c
+7.9% +288 +7.9% +288 decode_msg
[NEW] +174 [NEW] +128 decode_isdonefallback
+56% +160 +65% +160 upb_decode
-9.3% -16 -12.5% -16 decode_longvarint64
-25.5% -558 [ = ] 0 [Unmapped]
+0.0% +48 +0.4% +560 TOTAL
4 years ago
Joshua Haberman
4ea81ab107
Fixed pedantic warning.
4 years ago
Joshua Haberman
6399b31f4b
Removed ULL constants in json_decode.c.
4 years ago
Joshua Haberman
c8ae197e64
Removed "U" suffixes, they are not necessary.
4 years ago
Joshua Haberman
bc1e0b314f
Fixed some strict C89 errors.
4 years ago
Joshua Haberman
2c1664906a
Removed license comments and upb_amalgamation for google3.
4 years ago
Joshua Haberman
b7dc77415a
Added licenses() to all BUILD files.
4 years ago
Joshua Haberman
e3f41de6c7
Split monolithic BUILD file into many build files.
4 years ago
gerben-s
9e68ec033f
Add repeated varints and fixed parsers
4 years ago
Joshua Haberman
bf8e08074c
Added a few more comments.
4 years ago
Joshua Haberman
3238821315
Gave fast table entry a nicer name.
4 years ago
Joshua Haberman
2a574d3d01
Added a bunch of comments for readability.
4 years ago
Joshua Haberman
c2901eeee1
Added missing #includes (caught by Blaze).
4 years ago
Joshua Haberman
1c8c16b9b1
Use quoted include.
4 years ago
Joshua Haberman
6f59f1256e
Optimizations to descriptor loading.
4 years ago
Joshua Haberman
c10b24ffb2
Simplified switch().
4 years ago
Joshua Haberman
ded2e657a7
Added compatibility with old generated code.
...
Until everyone can regenerate their code, we need to provide
compatible semantics with the old generated code.
Also fixed a bug where enums were allocated 8 bytes instead
of 4.
4 years ago
Joshua Haberman
5b0c5c7d4a
Dispatch inline.
4 years ago
Joshua Haberman
75edd3e59c
Changed to use table pairs, seems to ever-so-slightly regress.
4 years ago
Joshua Haberman
bca7edac8c
Cleaned up table compression a bit.
4 years ago
Joshua Haberman
b95f217996
A little speed boost, now hitting 2.51GB/s.
4 years ago
Joshua Haberman
8ed6b2fe85
Stored mask in the table pointer.
4 years ago
Joshua Haberman
a6dc88556d
Tables are compressed, but perf goes down to 2.44GB/s.
4 years ago
Joshua Haberman
91eb09b1bc
Add a few comments.
4 years ago
Joshua Haberman
7ccf5650c7
If we encounter "null" for a non-NullValue enum, throw an error.
4 years ago
Joshua Haberman
0a3a94a12f
Updated to a new version of protobuf and fixed a few conformance tests.
...
When `google.protobuf.NullValue` appears on its own, we need to parse
and serialize it as plain JSON "null" value.
4 years ago
Joshua Haberman
504e105420
undef UPB_ASAN.
4 years ago
Joshua Haberman
ab96d1ec41
Removed extraneous C++-style comment.
4 years ago
Joshua Haberman
d5096f9ee8
Fixed bug in addunknown and added ASAN poisoning.
4 years ago
Joshua Haberman
f01efe8b64
Removed another C99-ism.
4 years ago
Joshua Haberman
1749082bbb
Removed C99-ism.
4 years ago
Joshua Haberman
f2ddc15d76
Bugfix: initialize fastlimit and fastend.
4 years ago
Gerben Stavenga
4053805759
Bugfixes
4 years ago
Joshua Haberman
2339fc779c
Updated obsolete comment.
4 years ago
Joshua Haberman
b393849bbd
Updated obsolete comment.
4 years ago
Joshua Haberman
ebe53f8590
Fixed compile error.
4 years ago
Joshua Haberman
b37f82b58b
Fixed compile error.
4 years ago
Joshua Haberman
71749b7caf
Implemented inline array allocation, and moved type->lg2 map to reflection.
4 years ago
Joshua Haberman
9557b97acc
Implemented inline array allocation, and moved type->lg2 map to reflection.
4 years ago
Joshua Haberman
b58d2a0ee6
Shrink overhead of message representation.
4 years ago
Joshua Haberman
0bf063a2ca
Shrink overhead of message representation.
4 years ago
Joshua Haberman
d87ceeacab
Shave off one more store.
4 years ago
Joshua Haberman
ddc52ab9d6
Shave off one more store.
4 years ago
Joshua Haberman
c25d895adf
Shrunk the arena state that needs to be synced.
4 years ago
Joshua Haberman
7f67f68c1c
Shrunk the arena state that needs to be synced.
4 years ago