Joshua Haberman
6fe84526be
Mark _is_bazel to be replace in google3.
4 years ago
Joshua Haberman
f01efe8b64
Removed another C99-ism.
4 years ago
Joshua Haberman
1749082bbb
Removed C99-ism.
4 years ago
Joshua Haberman
147e363f56
Merge branch 'master' into fast-table
4 years ago
Joshua Haberman
f2ddc15d76
Bugfix: initialize fastlimit and fastend.
4 years ago
Joshua Haberman
65e49b694b
Merge branch 'gerben-fast-table' into fast-table
4 years ago
Joshua Haberman
1abf7d418d
Added generated files.
4 years ago
Gerben Stavenga
3f719fa6b2
Bugfix: offsetting hasbits with 16 introduced a bug in calculating
...
hasmasks. Removing extra <<16 shift in hasmask calculating and masking
out the first 16 bits. This makes messages without hasbits work as well.
4 years ago
Joshua Haberman
aec762e405
Merge branch 'master' into fast-table
...
Tests are failing.
4 years ago
Joshua Haberman
4f77aaafd8
Merge pull request #2 from gerben-s/fast-table
...
Refactor some code. I extracted some common code from all message field
4 years ago
Gerben Stavenga
4053805759
Bugfixes
4 years ago
Joshua Haberman
d1cd80385b
Merge pull request #313 from haberman/inline-arena
...
Inline arena for the duration of the decode.
4 years ago
Joshua Haberman
ad21083623
Merge pull request #313 from haberman/inline-arena
...
Inline arena for the duration of the decode.
4 years ago
Joshua Haberman
2339fc779c
Updated obsolete comment.
4 years ago
Joshua Haberman
b393849bbd
Updated obsolete comment.
4 years ago
Joshua Haberman
ebe53f8590
Fixed compile error.
4 years ago
Joshua Haberman
b37f82b58b
Fixed compile error.
4 years ago
Joshua Haberman
71749b7caf
Implemented inline array allocation, and moved type->lg2 map to reflection.
4 years ago
Joshua Haberman
9557b97acc
Implemented inline array allocation, and moved type->lg2 map to reflection.
4 years ago
Joshua Haberman
b58d2a0ee6
Shrink overhead of message representation.
4 years ago
Joshua Haberman
0bf063a2ca
Shrink overhead of message representation.
4 years ago
Joshua Haberman
d87ceeacab
Shave off one more store.
4 years ago
Joshua Haberman
ddc52ab9d6
Shave off one more store.
4 years ago
Joshua Haberman
c25d895adf
Shrunk the arena state that needs to be synced.
4 years ago
Joshua Haberman
7f67f68c1c
Shrunk the arena state that needs to be synced.
4 years ago
Joshua Haberman
ff40dd6ea9
Added new internal header.
4 years ago
Joshua Haberman
85a43e5461
Added new internal header.
4 years ago
Gerben Stavenga
36662b3735
Refactor some code. I extracted some common code from all message field
...
parsers, to a tail recursive function. Removed the varint jmp table for
a simple varint parse loop, that removes the stack frames. Also careful
with not losing information in repeated message tag check. When written
mindful the checks and loads that happen can be reused for tag dispatch
if not the expected tag.
4 years ago
Joshua Haberman
cbcd635917
Fixed memory leak.
4 years ago
Joshua Haberman
bcbcdadbd2
Fixed memory leak.
4 years ago
Joshua Haberman
e5264bd794
Merge pull request #312 from haberman/defiter
...
Added simple offset-based accessors for defs, and deprecated old iterators
4 years ago
Joshua Haberman
52957fa984
Merge pull request #312 from haberman/defiter
...
Added simple offset-based accessors for defs, and deprecated old iterators
4 years ago
Joshua Haberman
746f64692c
Moved arena inline for decoder.
4 years ago
Joshua Haberman
7363b91ac3
Moved arena inline for decoder.
4 years ago
Joshua Haberman
b8ef1dcc57
Removed C++-style comments.
4 years ago
Joshua Haberman
575acd85bd
Re-added const for all of the pointer wrapper types.
4 years ago
Joshua Haberman
5aa5b77b41
Added simple offset-based accessors for defs, and deprecated old iterators.
4 years ago
Joshua Haberman
33384301e2
Merge pull request #311 from haberman/proto2-benchmark2
...
Benchmarks vs. proto2
4 years ago
Joshua Haberman
578e7c1f4c
Merge pull request #311 from haberman/proto2-benchmark2
...
Benchmarks vs. proto2
4 years ago
Joshua Haberman
bc301e7da4
Use merge/partial variants to give proto2 benchmark the fairest hearing.
4 years ago
Joshua Haberman
30f01afa83
Added LargeInitialBlock test for proto2.
4 years ago
Joshua Haberman
5d23fd99af
Used shorter protobuf:: namespace alias.
4 years ago
Joshua Haberman
9938cf8f27
Put submsg_index directly in table data. Drop oneof support for now to focus.
4 years ago
Joshua Haberman
d87179501d
Another build fix.
4 years ago
Joshua Haberman
89bd8b87e1
Fixed a few more C89 compat issues.
4 years ago
Joshua Haberman
64d293894a
Fixed bug introduced by last optimization.
4 years ago
Joshua Haberman
ff957b996c
Fixed C89 compat issues.
4 years ago
Joshua Haberman
537b6f42c2
A few updates to the benchamrk and minor implementation changes.
4 years ago
Joshua Haberman
0dcc5641eb
Replicated dispatch and implemeted array resizing logic. Up to 2.67GB/s.
4 years ago
Joshua Haberman
526e430794
I think this may have reached the optimization limit.
...
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_ArenaOneAlloc 21 ns 21 ns 32994231
BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005
BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s
BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s
BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s
BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s
BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s
BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s
$ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap
2020-10-08 14:07:06
Running bazel-bin/benchmark
Run on (72 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32K (x36)
L1 Instruction 32K (x36)
L2 Unified 1024K (x36)
L3 Unified 25344K (x2)
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s
Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap':
1,050.22 msec task-clock # 0.978 CPUs utilized
4 context-switches # 0.004 K/sec
0 cpu-migrations # 0.000 K/sec
179 page-faults # 0.170 K/sec
3,875,796,334 cycles # 3.690 GHz
13,282,835,967 instructions # 3.43 insn per cycle
2,887,725,848 branches # 2749.627 M/sec
8,324,912 branch-misses # 0.29% of all branches
1.073924364 seconds time elapsed
1.042806000 seconds user
0.008021000 seconds sys
Profile:
23.96% benchmark benchmark [.] upb_prm_1bt_max192b
22.44% benchmark benchmark [.] fastdecode_dispatch
18.96% benchmark benchmark [.] upb_pss_1bt
14.20% benchmark benchmark [.] upb_psv4_1bt
8.33% benchmark benchmark [.] upb_prm_1bt_max64b
6.66% benchmark benchmark [.] upb_prm_1bt_max128b
1.29% benchmark benchmark [.] upb_psm_1bt_max64b
0.77% benchmark benchmark [.] fastdecode_generic
0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single
0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.42% benchmark benchmark [.] upb_psm_1bt_max256b
0.31% benchmark benchmark [.] upb_psb1_1bt
0.21% benchmark benchmark [.] upb_plv4_5bv
0.14% benchmark benchmark [.] upb_psb1_2bt
0.12% benchmark benchmark [.] decode_longvarint64
0.08% benchmark [kernel.kallsyms] [k] vsnprintf
0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock
0.07% benchmark benchmark [.] _upb_msg_new
0.06% benchmark ld-2.31.so [.] check_match
4 years ago