protobuf

Commit Graph

Author	SHA1	Message	Date
Joshua Haberman	021db6fcd5	Allow larger tags into the table if they are unique mod 31. Also fixed a bug with fixed packed in decode_fast.c.	4 years ago
Joshua Haberman	86d9908c55	Fastdecode support for packed fields. This is not very optimized yet. There is a lot of room to optimize it further.	4 years ago
Joshua Haberman	e3e797b680	Added fasttable support for oneofs.	4 years ago
Joshua Haberman	7ffa9c181a	Fixed some small bugs and performance problems in string copying. Before this CL, with alias=false: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 3715 ns 3715 ns 188916 1.88206GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,122.92 msec task-clock # 0.979 CPUs utilized 3 context-switches # 0.003 K/sec 0 cpu-migrations # 0.000 K/sec 196 page-faults # 0.175 K/sec 4,144,746,717 cycles # 3.691 GHz 15,351,966,804 instructions # 3.70 insn per cycle 2,590,281,905 branches # 2306.728 M/sec 2,996,157 branch-misses # 0.12% of all branches 1.146615328 seconds time elapsed 1.115578000 seconds user 0.008025000 seconds sys After this CL: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 3554 ns 3554 ns 197527 1.9674GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,105.34 msec task-clock # 0.982 CPUs utilized 3 context-switches # 0.003 K/sec 0 cpu-migrations # 0.000 K/sec 197 page-faults # 0.178 K/sec 4,077,736,892 cycles # 3.689 GHz 15,442,709,352 instructions # 3.79 insn per cycle 2,435,131,301 branches # 2203.068 M/sec 2,643,775 branch-misses # 0.11% of all branches 1.125393845 seconds time elapsed 1.097770000 seconds user 0.008012000 seconds sys	4 years ago
Joshua Haberman	e2c709e047	Repeated string and primitive support. Much of the code was adapted from Gerben's code in: `6333031195`	4 years ago
Joshua Haberman	d81ba58215	Optimized short string copying. This sped up the alias=false case: Before: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 4562 ns 4562 ns 153251 1.53276GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,216.65 msec task-clock # 0.936 CPUs utilized 6 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 200 page-faults # 0.164 K/sec 4,490,925,650 cycles # 3.691 GHz 16,516,403,731 instructions # 3.68 insn per cycle 2,828,536,650 branches # 2324.861 M/sec 5,425,830 branch-misses # 0.19% of all branches 1.300178903 seconds time elapsed 1.211475000 seconds user 0.072207000 seconds sys After: ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_Parse_Upb_FileDesc_WithInitialBlock 3587 ns 3587 ns 195749 1.94935GB/s Performance counter stats for 'bazel-bin/benchmarks/benchmark --benchmark_filter=BM_Parse_Upb_FileDesc_WithInitialBlock': 1,109.69 msec task-clock # 0.930 CPUs utilized 5 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 198 page-faults # 0.178 K/sec 4,094,010,257 cycles # 3.689 GHz 15,672,677,812 instructions # 3.83 insn per cycle 2,589,291,160 branches # 2333.346 M/sec 3,306,386 branch-misses # 0.13% of all branches 1.193221789 seconds time elapsed 1.102538000 seconds user 0.072166000 seconds sys	4 years ago
Joshua Haberman	f3a2a79349	More optimization, back up to 2.56GB/s.	4 years ago
Joshua Haberman	199c914295	Simplify push/pop when msg fits in the current buffer.	4 years ago
Joshua Haberman	d5f5db2729	Put string-copying field parser into a separate function. This helps to regain a bit of lost perf. Now at 2.3GB/s.	4 years ago
Joshua Haberman	2a574d3d01	Added a bunch of comments for readability.	4 years ago
Joshua Haberman	5b0c5c7d4a	Dispatch inline.	4 years ago
Joshua Haberman	75edd3e59c	Changed to use table pairs, seems to ever-so-slightly regress.	4 years ago
Joshua Haberman	f01efe8b64	Removed another C99-ism.	4 years ago
Joshua Haberman	1749082bbb	Removed C99-ism.	4 years ago
Gerben Stavenga	4053805759	Bugfixes	4 years ago
Gerben Stavenga	36662b3735	Refactor some code. I extracted some common code from all message field parsers, to a tail recursive function. Removed the varint jmp table for a simple varint parse loop, that removes the stack frames. Also careful with not losing information in repeated message tag check. When written mindful the checks and loads that happen can be reused for tag dispatch if not the expected tag.	4 years ago
Joshua Haberman	9938cf8f27	Put submsg_index directly in table data. Drop oneof support for now to focus.	4 years ago
Joshua Haberman	89bd8b87e1	Fixed a few more C89 compat issues.	4 years ago
Joshua Haberman	64d293894a	Fixed bug introduced by last optimization.	4 years ago
Joshua Haberman	ff957b996c	Fixed C89 compat issues.	4 years ago
Joshua Haberman	537b6f42c2	A few updates to the benchamrk and minor implementation changes.	4 years ago
Joshua Haberman	0dcc5641eb	Replicated dispatch and implemeted array resizing logic. Up to 2.67GB/s.	4 years ago
Joshua Haberman	526e430794	I think this may have reached the optimization limit. ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_ArenaOneAlloc 21 ns 21 ns 32994231 BM_ArenaInitialBlockOneAlloc 6 ns 6 ns 116318005 BM_ParseDescriptorNoHeap 3028 ns 3028 ns 231138 2.34354GB/s BM_ParseDescriptor 3557 ns 3557 ns 196583 1.99498GB/s BM_ParseDescriptorProto2NoArena 33228 ns 33226 ns 21196 218.688MB/s BM_ParseDescriptorProto2WithArena 22863 ns 22861 ns 30666 317.831MB/s BM_SerializeDescriptorProto2 5444 ns 5444 ns 127368 1.30348GB/s BM_SerializeDescriptor 12509 ns 12508 ns 55816 580.914MB/s $ perf stat bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap 2020-10-08 14:07:06 Running bazel-bin/benchmark Run on (72 X 3700 MHz CPU s) CPU Caches: L1 Data 32K (x36) L1 Instruction 32K (x36) L2 Unified 1024K (x36) L3 Unified 25344K (x2) ---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- BM_ParseDescriptorNoHeap 3071 ns 3071 ns 227743 2.31094GB/s Performance counter stats for 'bazel-bin/benchmark --benchmark_filter=BM_ParseDescriptorNoHeap': 1,050.22 msec task-clock # 0.978 CPUs utilized 4 context-switches # 0.004 K/sec 0 cpu-migrations # 0.000 K/sec 179 page-faults # 0.170 K/sec 3,875,796,334 cycles # 3.690 GHz 13,282,835,967 instructions # 3.43 insn per cycle 2,887,725,848 branches # 2749.627 M/sec 8,324,912 branch-misses # 0.29% of all branches 1.073924364 seconds time elapsed 1.042806000 seconds user 0.008021000 seconds sys Profile: 23.96% benchmark benchmark [.] upb_prm_1bt_max192b 22.44% benchmark benchmark [.] fastdecode_dispatch 18.96% benchmark benchmark [.] upb_pss_1bt 14.20% benchmark benchmark [.] upb_psv4_1bt 8.33% benchmark benchmark [.] upb_prm_1bt_max64b 6.66% benchmark benchmark [.] upb_prm_1bt_max128b 1.29% benchmark benchmark [.] upb_psm_1bt_max64b 0.77% benchmark benchmark [.] fastdecode_generic 0.55% benchmark [kernel.kallsyms] [k] smp_call_function_single 0.42% benchmark [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.42% benchmark benchmark [.] upb_psm_1bt_max256b 0.31% benchmark benchmark [.] upb_psb1_1bt 0.21% benchmark benchmark [.] upb_plv4_5bv 0.14% benchmark benchmark [.] upb_psb1_2bt 0.12% benchmark benchmark [.] decode_longvarint64 0.08% benchmark [kernel.kallsyms] [k] vsnprintf 0.07% benchmark [kernel.kallsyms] [k] _raw_spin_lock 0.07% benchmark benchmark [.] _upb_msg_new 0.06% benchmark ld-2.31.so [.] check_match	4 years ago
Joshua Haberman	4c65b25daf	Handle long varints, now 2GB/s!	4 years ago
Joshua Haberman	e39ec95ca2	Hoisted updates to limits and depth out of the loop.	4 years ago
Joshua Haberman	388b6f64eb	A small optimization: don't increment array length every iteration.	4 years ago
Joshua Haberman	9e5c5ce089	Optimized memset() with cutoff and fixed group & unknown message bugs.	4 years ago
Joshua Haberman	8dd7b5a2ca	A bunch more optimization.	4 years ago
Joshua Haberman	405e7934b1	Handle 2-byte submessage lengths.	4 years ago
Joshua Haberman	88b1ec7784	Table-driven supports repeated sub-messages.	4 years ago
Joshua Haberman	f173642db4	Handle non-repeated submessages.	4 years ago
Joshua Haberman	fac992db83	Cleanup for showing.	4 years ago
Joshua Haberman	438ecaeb5a	Give all field parsers a generic table entry.	4 years ago
Joshua Haberman	383ae5293e	WIP.	4 years ago
Joshua Haberman	26abaa2345	WIP.	4 years ago
Joshua Haberman	34b98bc030	Avoid passing too many params to fallback.	4 years ago
Joshua Haberman	763a3f6293	WIP.	4 years ago

41 Commits (3eba47914bdb980a6d244fbfb85c9569a8bbc493)