boringssl

Commit Graph

Author	SHA1	Message	Date
Adam Langley	15596efa5f	Include hopefully all ARM instructions with condition codes. We need to know which ARM instructions take a condition code because otherwise the conditions look like symbols. This change includes all instructions beginning with 'c' from [1] that include a `cond` argument. Also sort them for easier comparison. [1] https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/CBNZ Change-Id: Iea07aa4afe171d684135ff6655c52374d86529ce Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53745 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	3 years ago
Adam Langley	9a836f7840	Update delocate tests I broke the delocate tests with `27ffcc6e19` because that change switched the integrity check hash function in the tested configuration to SHA-256, but didn't update the expectation files. Change-Id: I05f61eda795c833847981c5b21287fd0d2b33064 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52405 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	3 years ago
Nevine Ebeid	fa3fbda07b	P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at `19e277dd19/crypto/ec/asm/ecp_nistz256-armv8.pl` (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32(12-4), is not left for the assembler to perform. Other modifications: `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	3 years ago
Adam Langley	27ffcc6e19	Use SHA-256 for the FIPS integrity check everywhere. There are paperwork reasons why it's useful to use the same hash function in all cases. Thus unify on SHA-256 because contexts where SHA-512 is faster, are faster overall and thus less sensitive. Change-Id: I7a782a3adba4ace3257313a24dc8bc213b9d64ec Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52165 Reviewed-by: David Benjamin <davidben@google.com>	3 years ago
Adam Langley	b9c6d67c2c	delocate: handle a new output form in Clang 13. Clang 13 will put a “-1” inside a DWARF expression that's the difference between two labels. We just need to pass it onto the output. Change-Id: Ib58d245157a44ae9f1839c2af123bfe01791abf1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51445 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com>	3 years ago
Adam Langley	b90cdddcdc	swtb is another AArch64 magic tweak. Change-Id: I25dd24d82be3dad4314a350cd32edc06fe9b59c9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48245 Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	2e54edf323	A couple of Aarch64 FIPS delocate fixes. Clang 12 in opt mode produces a couple of assembly patterns that were not handled by delocate. Firstly, two-digit vector indexes were just a simple omission. Fixed. Secondly, Clang puts symbol deltas in .byte directives, and bit-shifts them. The .byte directive was not considered to be a symbol-containing directive because it's too small, but it could store deltas. Additionally, bit-shifting of symbol expressions was not supported. Fixed. Change-Id: I796299821f5ac7d3639fa6243c5d9bd5342bbddf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/47064 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	4 years ago
David Benjamin	1c919724d3	Support MOVLPS and MOVHPS in delocate. GCC 10.2.1 seems to be emitting code like this: movq gcm_gmult_clmul@GOTPCREL(%rip), %xmm0 movhps gcm_ghash_clmul@GOTPCREL(%rip), %xmm0 movaps %xmm0, (%rsp) This is assembling a pair of function pointers in %xmm0 and writing the two out together. I've not observed the compiler output movlps, but supporting movhps and movlps are about as tricky. The main complication is that these instructions preserve the unwritten half of the destination, and they do not support register sources, only memory. This CL supports them by loading in a general-purpose register as we usually do, pushing the register on the stack, and then running the instruction on (%rsp). Some alternatives I considered: - Save/restore a temporary XMM register and then use MOVHLPS and MOVLHPS. This would work but require another saveRegister-like wrapper. - Take advantage of loadFromGOT ending in a memory mov and swap out the final instruction. This would be more efficient, but we downgrade GOT-based accesses to local symbols to a plain LEA. The compiler will only do this when we write a pair of function pointers in a row, so trying to optimize the non-local symbols seems not worth the trouble. (Really the compiler should not be emitting GOT-relative loads at all, but the compiler doesn't know these symbols will be private and in the same module, so it has a habit of pessimally using GOT-based loads.) This option seemed the simplest. Change-Id: I8c4915a6a0d72aa4c5f4d581081b99b3a6ab64c2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45244 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
Adam Langley	c5e2cf3c07	delocate: support Aarch64 Add Aarch64 support to delocate. Since it's a modern ISA, it's actually not too bad once I understood the behaviour of the assembler. Change-Id: I105fede43b5196b7ff7bdbf1ee71c6cfa2fc1aab Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44848 Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	5d54832f1a	delocate: handle Aarch64 assembly in parser. Aarch64 assembly is quite different from x86-64 or POWER. But the system of directives is the same so there's quite a lot of utility from being able to use the same delocate framework. Unfortunately, with peg, there's no obvious way to be able to parse instructions differently without breaking the parsing into two stages. Thus the parser is extended here to support all three ISAs. This seems to work ok without breaking either of the other two. Change-Id: Iced0f651e556e6ffae3eb35f2edfc0bf84167967 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44846 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	e4843750e5	delocate: support alternative comment indicators aarch64 assembly files use "//" as the comment indicator because '#' indicates a constant value. Change-Id: I53b18cbb3498522b0924716238abf55e6627d216 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44844 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	56308910f3	delocate: use 64-bit GOT offsets in the large memory model. I tried to save space and use 32-bit GOT offsets since a GOT > 2GiB is crazy. However, Clang's linker emits 64-bit relocations even for .long, thus the four bytes following each offset get stomped. It mostly works because the relocations are applied in order, thus the following relocation gets stomped but is then processed and fixed. But there's four bytes of stomp at the end which hits the module integrity hash, which is fatal. This could be fixed by adding four bytes of padding after the list of offsets, but that's piling a hack on a hack. So this change just switches to 64-bit offsets. Change-Id: I227eec67c481d93a414fbed19aa99471f9df0f0e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42484 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
Adam Langley	0cd846f24f	delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq (%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq (%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq (%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 23 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2*3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
Adam Langley	fb0c05cac2	acvp: add CMAC-AES support. Change by Dan Janni. Change-Id: I3f059e7b1a822c6f97128ca92a693499a3f7fa8f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/41984 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago

14 Commits (a61e7475d3145836cc11469f328ad7c98e7cf527)