Mirror of BoringSSL (grpc依赖) https://boringssl.googlesource.com/boringssl
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

105 lines
5.2 KiB

# Copyright (c) 2017, Google Inc.
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
# SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
# OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
# CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */
# This is a rough parser for x86-64 and ppc64le assembly designed to work with
# https://github.com/pointlander/peg. delocate.go has a go:generate line for
# rebuilding delocate.peg.go from this file.
package main
type Asm Peg {}
AsmFile <- Statement* !.
Statement <- WS? (Label / ((GlobalDirective /
LocationDirective /
LabelContainingDirective /
Instruction /
Directive /
Comment / ) WS? ((Comment? '\n') / ';')))
GlobalDirective <- (".global" / ".globl") WS SymbolName
Directive <- '.' DirectiveName (WS Args)?
DirectiveName <- [[A-Z0-9_]]+
LocationDirective <- FileDirective / LocDirective
FileDirective <- ".file" WS [^#\n]+
LocDirective <- ".loc" WS [^#/\n]+
Args <- Arg ((WS? ',' WS?) Arg)*
Arg <- QuotedArg / [[0-9a-z%+\-*_@.]]*
QuotedArg <- '"' QuotedText '"'
QuotedText <- (EscapedChar / [^"])*
LabelContainingDirective <- LabelContainingDirectiveName WS SymbolArgs
LabelContainingDirectiveName <- ".xword" / ".word" / ".long" / ".set" / ".byte" / ".8byte" / ".4byte" / ".quad" / ".tc" / ".localentry" / ".size" / ".type" / ".uleb128" / ".sleb128"
SymbolArgs <- SymbolArg ((WS? ',' WS?) SymbolArg)*
SymbolShift <- ('<<' / '>>') WS? [0-9]+
SymbolArg <- (OpenParen WS?)? (
Offset /
SymbolType /
(Offset / LocalSymbol / SymbolName / Dot) (WS? Operator WS? (Offset / LocalSymbol / SymbolName))* /
LocalSymbol TCMarker? /
SymbolName Offset /
SymbolName TCMarker?)
(WS? CloseParen)? (WS? SymbolShift)?
OpenParen <- '('
CloseParen <- ')'
SymbolType <- [@%] ('function' / 'object')
Dot <- '.'
TCMarker <- '[TC]'
EscapedChar <- '\\' .
WS <- [ \t]+
Comment <- ("//" / '#') [^\n]*
Label <- (LocalSymbol / LocalLabel / SymbolName) ':'
SymbolName <- [[A-Z._]][[A-Z.0-9$_]]*
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
LocalSymbol <- '.L' [[A-Za-z.0-9$_]]+
LocalLabel <- [0-9][0-9$]*
LocalLabelRef <- [0-9][0-9$]*[bf]
Instruction <- InstructionName (WS InstructionArg ((WS? ',' WS?) InstructionArg)*)?
InstructionName <- [[A-Z]][[A-Z.0-9]]* [.+\-]?
InstructionArg <- IndirectionIndicator? (ARMConstantTweak / RegisterOrConstant / LocalLabelRef / TOCRefHigh / TOCRefLow / GOTLocation / GOTSymbolOffset / MemoryRef) AVX512Token*
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
GOTLocation <- '$_GLOBAL_OFFSET_TABLE_-' LocalSymbol
GOTSymbolOffset <- ('$' SymbolName '@GOT' 'OFF'?) / (":got:" SymbolName)
AVX512Token <- WS? '{' '%'? [0-9a-z]* '}'
TOCRefHigh <- '.TOC.-' ('0b' / ('.L' [a-zA-Z_0-9]+)) "@ha"
TOCRefLow <- '.TOC.-' ('0b' / ('.L' [a-zA-Z_0-9]+)) "@l"
IndirectionIndicator <- '*'
RegisterOrConstant <- (('%'[[A-Z]][[A-Z0-9]]*) /
('$'? ((Offset Offset) / Offset)) /
('#' Offset ('*' [0-9]+ ('-' [0-9] [0-9]*)?)? ) /
('#' '~'? '(' [0-9] WS? "<<" WS? [0-9] ')' ) /
ARMRegister)
![fb:(+\-]
ARMConstantTweak <- ("lsl" / "sxtw" / "sxtb" / "uxtw" / "uxtb" / "lsr" / "ror" / "asr") (WS '#' Offset)?
ARMRegister <- "sp" / ([xwdqs] [0-9] [0-9]?) / "xzr" / "wzr" / ARMVectorRegister / ('{' WS? ARMVectorRegister (',' WS? ARMVectorRegister)* WS? '}' ('[' [0-9] [0-9]? ']')? )
ARMVectorRegister <- "v" [0-9] [0-9]? ('.' [0-9]* [bsdhq] ('[' [0-9] [0-9]? ']')? )?
# Compilers only output a very limited number of expression forms. Rather than
# implement a full expression parser, this enumerate those forms plus a few
# that appear in our hand-written assembly.
MemoryRef <- (SymbolRef BaseIndexScale /
SymbolRef /
Low12BitsSymbolRef /
Offset* BaseIndexScale /
SegmentRegister Offset BaseIndexScale /
SegmentRegister BaseIndexScale /
SegmentRegister Offset /
ARMBaseIndexScale /
BaseIndexScale)
SymbolRef <- (Offset* '+')? (LocalSymbol / SymbolName) Offset* ('@' Section Offset*)?
Low12BitsSymbolRef <- ":lo12:" (LocalSymbol / SymbolName) Offset?
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
ARMBaseIndexScale <- '[' ARMRegister (',' WS? (('#' Offset (('*' [0-9]+) / ('*' '(' [0-9]+ Operator [0-9]+ ')') / (('+' [0-9]+)*))? ) / ARMGOTLow12 / Low12BitsSymbolRef / ARMRegister) (',' WS? ARMConstantTweak)?)? ']' ARMPostincrement?
ARMGOTLow12 <- ":got_lo12:" SymbolName
ARMPostincrement <- '!'
BaseIndexScale <- '(' RegisterOrConstant? WS? (',' WS? RegisterOrConstant WS? (',' [0-9]+)? )? ')'
Operator <- [+\-]
Offset <- '+'? '-'? (("0b" [01]+) / ("0x" [[0-9A-F]]+) / [0-9]+)
Section <- [[A-Z@]]+
SegmentRegister <- '%' [c-gs] 's:'