Mirror of BoringSSL (grpc依赖) https://boringssl.googlesource.com/boringssl
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6524 lines
146 KiB

package main
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
// Code generated by ./peg/peg delocate.peg DO NOT EDIT.
import (
"fmt"
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
"io"
"os"
"sort"
"strconv"
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
"strings"
)
const endSymbol rune = 1114112
/* The rule types inferred from the grammar are below. */
type pegRule uint8
const (
ruleUnknown pegRule = iota
ruleAsmFile
ruleStatement
ruleGlobalDirective
ruleDirective
ruleDirectiveName
ruleLocationDirective
ruleFileDirective
ruleLocDirective
ruleArgs
ruleArg
ruleQuotedArg
ruleQuotedText
ruleLabelContainingDirective
ruleLabelContainingDirectiveName
ruleSymbolArgs
ruleSymbolShift
ruleSymbolArg
ruleOpenParen
ruleCloseParen
ruleSymbolType
ruleDot
ruleTCMarker
ruleEscapedChar
ruleWS
ruleComment
ruleLabel
ruleSymbolName
ruleLocalSymbol
ruleLocalLabel
ruleLocalLabelRef
ruleInstruction
ruleInstructionName
ruleInstructionArg
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
ruleGOTLocation
ruleGOTSymbolOffset
ruleAVX512Token
ruleTOCRefHigh
ruleTOCRefLow
ruleIndirectionIndicator
ruleRegisterOrConstant
ruleARMConstantTweak
ruleARMRegister
ruleARMVectorRegister
ruleMemoryRef
ruleSymbolRef
ruleLow12BitsSymbolRef
ruleARMBaseIndexScale
ruleARMGOTLow12
ruleARMPostincrement
ruleBaseIndexScale
ruleOperator
ruleOffset
ruleSection
ruleSegmentRegister
)
var rul3s = [...]string{
"Unknown",
"AsmFile",
"Statement",
"GlobalDirective",
"Directive",
"DirectiveName",
"LocationDirective",
"FileDirective",
"LocDirective",
"Args",
"Arg",
"QuotedArg",
"QuotedText",
"LabelContainingDirective",
"LabelContainingDirectiveName",
"SymbolArgs",
"SymbolShift",
"SymbolArg",
"OpenParen",
"CloseParen",
"SymbolType",
"Dot",
"TCMarker",
"EscapedChar",
"WS",
"Comment",
"Label",
"SymbolName",
"LocalSymbol",
"LocalLabel",
"LocalLabelRef",
"Instruction",
"InstructionName",
"InstructionArg",
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
"GOTLocation",
"GOTSymbolOffset",
"AVX512Token",
"TOCRefHigh",
"TOCRefLow",
"IndirectionIndicator",
"RegisterOrConstant",
"ARMConstantTweak",
"ARMRegister",
"ARMVectorRegister",
"MemoryRef",
"SymbolRef",
"Low12BitsSymbolRef",
"ARMBaseIndexScale",
"ARMGOTLow12",
"ARMPostincrement",
"BaseIndexScale",
"Operator",
"Offset",
"Section",
"SegmentRegister",
}
type token32 struct {
pegRule
begin, end uint32
}
func (t *token32) String() string {
return fmt.Sprintf("\x1B[34m%v\x1B[m %v %v", rul3s[t.pegRule], t.begin, t.end)
}
type node32 struct {
token32
up, next *node32
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
func (node *node32) print(w io.Writer, pretty bool, buffer string) {
var print func(node *node32, depth int)
print = func(node *node32, depth int) {
for node != nil {
for c := 0; c < depth; c++ {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
fmt.Fprintf(w, " ")
}
rule := rul3s[node.pegRule]
quote := strconv.Quote(string(([]rune(buffer)[node.begin:node.end])))
if !pretty {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
fmt.Fprintf(w, "%v %v\n", rule, quote)
} else {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
fmt.Fprintf(w, "\x1B[36m%v\x1B[m %v\n", rule, quote)
}
if node.up != nil {
print(node.up, depth+1)
}
node = node.next
}
}
print(node, 0)
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
func (node *node32) Print(w io.Writer, buffer string) {
node.print(w, false, buffer)
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
func (node *node32) PrettyPrint(w io.Writer, buffer string) {
node.print(w, true, buffer)
}
type tokens32 struct {
tree []token32
}
func (t *tokens32) Trim(length uint32) {
t.tree = t.tree[:length]
}
func (t *tokens32) Print() {
for _, token := range t.tree {
fmt.Println(token.String())
}
}
func (t *tokens32) AST() *node32 {
type element struct {
node *node32
down *element
}
tokens := t.Tokens()
var stack *element
for _, token := range tokens {
if token.begin == token.end {
continue
}
node := &node32{token32: token}
for stack != nil && stack.node.begin >= token.begin && stack.node.end <= token.end {
stack.node.next = node.up
node.up = stack.node
stack = stack.down
}
stack = &element{node: node, down: stack}
}
if stack != nil {
return stack.node
}
return nil
}
func (t *tokens32) PrintSyntaxTree(buffer string) {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
t.AST().Print(os.Stdout, buffer)
}
func (t *tokens32) WriteSyntaxTree(w io.Writer, buffer string) {
t.AST().Print(w, buffer)
}
func (t *tokens32) PrettyPrintSyntaxTree(buffer string) {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
t.AST().PrettyPrint(os.Stdout, buffer)
}
func (t *tokens32) Add(rule pegRule, begin, end, index uint32) {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
tree, i := t.tree, int(index)
if i >= len(tree) {
t.tree = append(tree, token32{pegRule: rule, begin: begin, end: end})
return
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
tree[i] = token32{pegRule: rule, begin: begin, end: end}
}
func (t *tokens32) Tokens() []token32 {
return t.tree
}
type Asm struct {
Buffer string
buffer []rune
rules [55]func() bool
parse func(rule ...int) error
reset func()
Pretty bool
tokens32
}
func (p *Asm) Parse(rule ...int) error {
return p.parse(rule...)
}
func (p *Asm) Reset() {
p.reset()
}
type textPosition struct {
line, symbol int
}
type textPositionMap map[int]textPosition
func translatePositions(buffer []rune, positions []int) textPositionMap {
length, translations, j, line, symbol := len(positions), make(textPositionMap, len(positions)), 0, 1, 0
sort.Ints(positions)
search:
for i, c := range buffer {
if c == '\n' {
line, symbol = line+1, 0
} else {
symbol++
}
if i == positions[j] {
translations[positions[j]] = textPosition{line, symbol}
for j++; j < length; j++ {
if i != positions[j] {
continue search
}
}
break search
}
}
return translations
}
type parseError struct {
p *Asm
max token32
}
func (e *parseError) Error() string {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
tokens, err := []token32{e.max}, "\n"
positions, p := make([]int, 2*len(tokens)), 0
for _, token := range tokens {
positions[p], p = int(token.begin), p+1
positions[p], p = int(token.end), p+1
}
translations := translatePositions(e.p.buffer, positions)
format := "parse error near %v (line %v symbol %v - line %v symbol %v):\n%v\n"
if e.p.Pretty {
format = "parse error near \x1B[34m%v\x1B[m (line %v symbol %v - line %v symbol %v):\n%v\n"
}
for _, token := range tokens {
begin, end := int(token.begin), int(token.end)
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
err += fmt.Sprintf(format,
rul3s[token.pegRule],
translations[begin].line, translations[begin].symbol,
translations[end].line, translations[end].symbol,
strconv.Quote(string(e.p.buffer[begin:end])))
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
return err
}
func (p *Asm) PrintSyntaxTree() {
if p.Pretty {
p.tokens32.PrettyPrintSyntaxTree(p.Buffer)
} else {
p.tokens32.PrintSyntaxTree(p.Buffer)
}
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
func (p *Asm) WriteSyntaxTree(w io.Writer) {
p.tokens32.WriteSyntaxTree(w, p.Buffer)
}
func (p *Asm) SprintSyntaxTree() string {
var bldr strings.Builder
p.WriteSyntaxTree(&bldr)
return bldr.String()
}
func Pretty(pretty bool) func(*Asm) error {
return func(p *Asm) error {
p.Pretty = pretty
return nil
}
}
func Size(size int) func(*Asm) error {
return func(p *Asm) error {
p.tokens32 = tokens32{tree: make([]token32, 0, size)}
return nil
}
}
func (p *Asm) Init(options ...func(*Asm) error) error {
var (
max token32
position, tokenIndex uint32
buffer []rune
)
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
for _, option := range options {
err := option(p)
if err != nil {
return err
}
}
p.reset = func() {
max = token32{}
position, tokenIndex = 0, 0
p.buffer = []rune(p.Buffer)
if len(p.buffer) == 0 || p.buffer[len(p.buffer)-1] != endSymbol {
p.buffer = append(p.buffer, endSymbol)
}
buffer = p.buffer
}
p.reset()
_rules := p.rules
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
tree := p.tokens32
p.parse = func(rule ...int) error {
r := 1
if len(rule) > 0 {
r = rule[0]
}
matches := p.rules[r]()
p.tokens32 = tree
if matches {
p.Trim(tokenIndex)
return nil
}
return &parseError{p, max}
}
add := func(rule pegRule, begin uint32) {
tree.Add(rule, begin, position, tokenIndex)
tokenIndex++
if begin != position && position > max.end {
max = token32{rule, begin, position}
}
}
matchDot := func() bool {
if buffer[position] != endSymbol {
position++
return true
}
return false
}
/*matchChar := func(c byte) bool {
if buffer[position] == c {
position++
return true
}
return false
}*/
/*matchRange := func(lower byte, upper byte) bool {
if c := buffer[position]; c >= lower && c <= upper {
position++
return true
}
return false
}*/
_rules = [...]func() bool{
nil,
/* 0 AsmFile <- <(Statement* !.)> */
func() bool {
position0, tokenIndex0 := position, tokenIndex
{
position1 := position
l2:
{
position3, tokenIndex3 := position, tokenIndex
if !_rules[ruleStatement]() {
goto l3
}
goto l2
l3:
position, tokenIndex = position3, tokenIndex3
}
{
position4, tokenIndex4 := position, tokenIndex
if !matchDot() {
goto l4
}
goto l0
l4:
position, tokenIndex = position4, tokenIndex4
}
add(ruleAsmFile, position1)
}
return true
l0:
position, tokenIndex = position0, tokenIndex0
return false
},
/* 1 Statement <- <(WS? (Label / ((GlobalDirective / LocationDirective / LabelContainingDirective / Instruction / Directive / Comment / ) WS? ((Comment? '\n') / ';'))))> */
func() bool {
position5, tokenIndex5 := position, tokenIndex
{
position6 := position
{
position7, tokenIndex7 := position, tokenIndex
if !_rules[ruleWS]() {
goto l7
}
goto l8
l7:
position, tokenIndex = position7, tokenIndex7
}
l8:
{
position9, tokenIndex9 := position, tokenIndex
if !_rules[ruleLabel]() {
goto l10
}
goto l9
l10:
position, tokenIndex = position9, tokenIndex9
{
position11, tokenIndex11 := position, tokenIndex
if !_rules[ruleGlobalDirective]() {
goto l12
}
goto l11
l12:
position, tokenIndex = position11, tokenIndex11
if !_rules[ruleLocationDirective]() {
goto l13
}
goto l11
l13:
position, tokenIndex = position11, tokenIndex11
if !_rules[ruleLabelContainingDirective]() {
goto l14
}
goto l11
l14:
position, tokenIndex = position11, tokenIndex11
if !_rules[ruleInstruction]() {
goto l15
}
goto l11
l15:
position, tokenIndex = position11, tokenIndex11
if !_rules[ruleDirective]() {
goto l16
}
goto l11
l16:
position, tokenIndex = position11, tokenIndex11
if !_rules[ruleComment]() {
goto l17
}
goto l11
l17:
position, tokenIndex = position11, tokenIndex11
}
l11:
{
position18, tokenIndex18 := position, tokenIndex
if !_rules[ruleWS]() {
goto l18
}
goto l19
l18:
position, tokenIndex = position18, tokenIndex18
}
l19:
{
position20, tokenIndex20 := position, tokenIndex
{
position22, tokenIndex22 := position, tokenIndex
if !_rules[ruleComment]() {
goto l22
}
goto l23
l22:
position, tokenIndex = position22, tokenIndex22
}
l23:
if buffer[position] != rune('\n') {
goto l21
}
position++
goto l20
l21:
position, tokenIndex = position20, tokenIndex20
if buffer[position] != rune(';') {
goto l5
}
position++
}
l20:
}
l9:
add(ruleStatement, position6)
}
return true
l5:
position, tokenIndex = position5, tokenIndex5
return false
},
/* 2 GlobalDirective <- <((('.' ('g' / 'G') ('l' / 'L') ('o' / 'O') ('b' / 'B') ('a' / 'A') ('l' / 'L')) / ('.' ('g' / 'G') ('l' / 'L') ('o' / 'O') ('b' / 'B') ('l' / 'L'))) WS SymbolName)> */
func() bool {
position24, tokenIndex24 := position, tokenIndex
{
position25 := position
{
position26, tokenIndex26 := position, tokenIndex
if buffer[position] != rune('.') {
goto l27
}
position++
{
position28, tokenIndex28 := position, tokenIndex
if buffer[position] != rune('g') {
goto l29
}
position++
goto l28
l29:
position, tokenIndex = position28, tokenIndex28
if buffer[position] != rune('G') {
goto l27
}
position++
}
l28:
{
position30, tokenIndex30 := position, tokenIndex
if buffer[position] != rune('l') {
goto l31
}
position++
goto l30
l31:
position, tokenIndex = position30, tokenIndex30
if buffer[position] != rune('L') {
goto l27
}
position++
}
l30:
{
position32, tokenIndex32 := position, tokenIndex
if buffer[position] != rune('o') {
goto l33
}
position++
goto l32
l33:
position, tokenIndex = position32, tokenIndex32
if buffer[position] != rune('O') {
goto l27
}
position++
}
l32:
{
position34, tokenIndex34 := position, tokenIndex
if buffer[position] != rune('b') {
goto l35
}
position++
goto l34
l35:
position, tokenIndex = position34, tokenIndex34
if buffer[position] != rune('B') {
goto l27
}
position++
}
l34:
{
position36, tokenIndex36 := position, tokenIndex
if buffer[position] != rune('a') {
goto l37
}
position++
goto l36
l37:
position, tokenIndex = position36, tokenIndex36
if buffer[position] != rune('A') {
goto l27
}
position++
}
l36:
{
position38, tokenIndex38 := position, tokenIndex
if buffer[position] != rune('l') {
goto l39
}
position++
goto l38
l39:
position, tokenIndex = position38, tokenIndex38
if buffer[position] != rune('L') {
goto l27
}
position++
}
l38:
goto l26
l27:
position, tokenIndex = position26, tokenIndex26
if buffer[position] != rune('.') {
goto l24
}
position++
{
position40, tokenIndex40 := position, tokenIndex
if buffer[position] != rune('g') {
goto l41
}
position++
goto l40
l41:
position, tokenIndex = position40, tokenIndex40
if buffer[position] != rune('G') {
goto l24
}
position++
}
l40:
{
position42, tokenIndex42 := position, tokenIndex
if buffer[position] != rune('l') {
goto l43
}
position++
goto l42
l43:
position, tokenIndex = position42, tokenIndex42
if buffer[position] != rune('L') {
goto l24
}
position++
}
l42:
{
position44, tokenIndex44 := position, tokenIndex
if buffer[position] != rune('o') {
goto l45
}
position++
goto l44
l45:
position, tokenIndex = position44, tokenIndex44
if buffer[position] != rune('O') {
goto l24
}
position++
}
l44:
{
position46, tokenIndex46 := position, tokenIndex
if buffer[position] != rune('b') {
goto l47
}
position++
goto l46
l47:
position, tokenIndex = position46, tokenIndex46
if buffer[position] != rune('B') {
goto l24
}
position++
}
l46:
{
position48, tokenIndex48 := position, tokenIndex
if buffer[position] != rune('l') {
goto l49
}
position++
goto l48
l49:
position, tokenIndex = position48, tokenIndex48
if buffer[position] != rune('L') {
goto l24
}
position++
}
l48:
}
l26:
if !_rules[ruleWS]() {
goto l24
}
if !_rules[ruleSymbolName]() {
goto l24
}
add(ruleGlobalDirective, position25)
}
return true
l24:
position, tokenIndex = position24, tokenIndex24
return false
},
/* 3 Directive <- <('.' DirectiveName (WS Args)?)> */
func() bool {
position50, tokenIndex50 := position, tokenIndex
{
position51 := position
if buffer[position] != rune('.') {
goto l50
}
position++
if !_rules[ruleDirectiveName]() {
goto l50
}
{
position52, tokenIndex52 := position, tokenIndex
if !_rules[ruleWS]() {
goto l52
}
if !_rules[ruleArgs]() {
goto l52
}
goto l53
l52:
position, tokenIndex = position52, tokenIndex52
}
l53:
add(ruleDirective, position51)
}
return true
l50:
position, tokenIndex = position50, tokenIndex50
return false
},
/* 4 DirectiveName <- <([a-z] / [A-Z] / ([0-9] / [0-9]) / '_')+> */
func() bool {
position54, tokenIndex54 := position, tokenIndex
{
position55 := position
{
position58, tokenIndex58 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l59
}
position++
goto l58
l59:
position, tokenIndex = position58, tokenIndex58
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l60
}
position++
goto l58
l60:
position, tokenIndex = position58, tokenIndex58
{
position62, tokenIndex62 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l63
}
position++
goto l62
l63:
position, tokenIndex = position62, tokenIndex62
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l61
}
position++
}
l62:
goto l58
l61:
position, tokenIndex = position58, tokenIndex58
if buffer[position] != rune('_') {
goto l54
}
position++
}
l58:
l56:
{
position57, tokenIndex57 := position, tokenIndex
{
position64, tokenIndex64 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l65
}
position++
goto l64
l65:
position, tokenIndex = position64, tokenIndex64
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l66
}
position++
goto l64
l66:
position, tokenIndex = position64, tokenIndex64
{
position68, tokenIndex68 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l69
}
position++
goto l68
l69:
position, tokenIndex = position68, tokenIndex68
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l67
}
position++
}
l68:
goto l64
l67:
position, tokenIndex = position64, tokenIndex64
if buffer[position] != rune('_') {
goto l57
}
position++
}
l64:
goto l56
l57:
position, tokenIndex = position57, tokenIndex57
}
add(ruleDirectiveName, position55)
}
return true
l54:
position, tokenIndex = position54, tokenIndex54
return false
},
/* 5 LocationDirective <- <(FileDirective / LocDirective)> */
func() bool {
position70, tokenIndex70 := position, tokenIndex
{
position71 := position
{
position72, tokenIndex72 := position, tokenIndex
if !_rules[ruleFileDirective]() {
goto l73
}
goto l72
l73:
position, tokenIndex = position72, tokenIndex72
if !_rules[ruleLocDirective]() {
goto l70
}
}
l72:
add(ruleLocationDirective, position71)
}
return true
l70:
position, tokenIndex = position70, tokenIndex70
return false
},
/* 6 FileDirective <- <('.' ('f' / 'F') ('i' / 'I') ('l' / 'L') ('e' / 'E') WS (!('#' / '\n') .)+)> */
func() bool {
position74, tokenIndex74 := position, tokenIndex
{
position75 := position
if buffer[position] != rune('.') {
goto l74
}
position++
{
position76, tokenIndex76 := position, tokenIndex
if buffer[position] != rune('f') {
goto l77
}
position++
goto l76
l77:
position, tokenIndex = position76, tokenIndex76
if buffer[position] != rune('F') {
goto l74
}
position++
}
l76:
{
position78, tokenIndex78 := position, tokenIndex
if buffer[position] != rune('i') {
goto l79
}
position++
goto l78
l79:
position, tokenIndex = position78, tokenIndex78
if buffer[position] != rune('I') {
goto l74
}
position++
}
l78:
{
position80, tokenIndex80 := position, tokenIndex
if buffer[position] != rune('l') {
goto l81
}
position++
goto l80
l81:
position, tokenIndex = position80, tokenIndex80
if buffer[position] != rune('L') {
goto l74
}
position++
}
l80:
{
position82, tokenIndex82 := position, tokenIndex
if buffer[position] != rune('e') {
goto l83
}
position++
goto l82
l83:
position, tokenIndex = position82, tokenIndex82
if buffer[position] != rune('E') {
goto l74
}
position++
}
l82:
if !_rules[ruleWS]() {
goto l74
}
{
position86, tokenIndex86 := position, tokenIndex
{
position87, tokenIndex87 := position, tokenIndex
if buffer[position] != rune('#') {
goto l88
}
position++
goto l87
l88:
position, tokenIndex = position87, tokenIndex87
if buffer[position] != rune('\n') {
goto l86
}
position++
}
l87:
goto l74
l86:
position, tokenIndex = position86, tokenIndex86
}
if !matchDot() {
goto l74
}
l84:
{
position85, tokenIndex85 := position, tokenIndex
{
position89, tokenIndex89 := position, tokenIndex
{
position90, tokenIndex90 := position, tokenIndex
if buffer[position] != rune('#') {
goto l91
}
position++
goto l90
l91:
position, tokenIndex = position90, tokenIndex90
if buffer[position] != rune('\n') {
goto l89
}
position++
}
l90:
goto l85
l89:
position, tokenIndex = position89, tokenIndex89
}
if !matchDot() {
goto l85
}
goto l84
l85:
position, tokenIndex = position85, tokenIndex85
}
add(ruleFileDirective, position75)
}
return true
l74:
position, tokenIndex = position74, tokenIndex74
return false
},
/* 7 LocDirective <- <('.' ('l' / 'L') ('o' / 'O') ('c' / 'C') WS (!('#' / '/' / '\n') .)+)> */
func() bool {
position92, tokenIndex92 := position, tokenIndex
{
position93 := position
if buffer[position] != rune('.') {
goto l92
}
position++
{
position94, tokenIndex94 := position, tokenIndex
if buffer[position] != rune('l') {
goto l95
}
position++
goto l94
l95:
position, tokenIndex = position94, tokenIndex94
if buffer[position] != rune('L') {
goto l92
}
position++
}
l94:
{
position96, tokenIndex96 := position, tokenIndex
if buffer[position] != rune('o') {
goto l97
}
position++
goto l96
l97:
position, tokenIndex = position96, tokenIndex96
if buffer[position] != rune('O') {
goto l92
}
position++
}
l96:
{
position98, tokenIndex98 := position, tokenIndex
if buffer[position] != rune('c') {
goto l99
}
position++
goto l98
l99:
position, tokenIndex = position98, tokenIndex98
if buffer[position] != rune('C') {
goto l92
}
position++
}
l98:
if !_rules[ruleWS]() {
goto l92
}
{
position102, tokenIndex102 := position, tokenIndex
{
position103, tokenIndex103 := position, tokenIndex
if buffer[position] != rune('#') {
goto l104
}
position++
goto l103
l104:
position, tokenIndex = position103, tokenIndex103
if buffer[position] != rune('/') {
goto l105
}
position++
goto l103
l105:
position, tokenIndex = position103, tokenIndex103
if buffer[position] != rune('\n') {
goto l102
}
position++
}
l103:
goto l92
l102:
position, tokenIndex = position102, tokenIndex102
}
if !matchDot() {
goto l92
}
l100:
{
position101, tokenIndex101 := position, tokenIndex
{
position106, tokenIndex106 := position, tokenIndex
{
position107, tokenIndex107 := position, tokenIndex
if buffer[position] != rune('#') {
goto l108
}
position++
goto l107
l108:
position, tokenIndex = position107, tokenIndex107
if buffer[position] != rune('/') {
goto l109
}
position++
goto l107
l109:
position, tokenIndex = position107, tokenIndex107
if buffer[position] != rune('\n') {
goto l106
}
position++
}
l107:
goto l101
l106:
position, tokenIndex = position106, tokenIndex106
}
if !matchDot() {
goto l101
}
goto l100
l101:
position, tokenIndex = position101, tokenIndex101
}
add(ruleLocDirective, position93)
}
return true
l92:
position, tokenIndex = position92, tokenIndex92
return false
},
/* 8 Args <- <(Arg (WS? ',' WS? Arg)*)> */
func() bool {
position110, tokenIndex110 := position, tokenIndex
{
position111 := position
if !_rules[ruleArg]() {
goto l110
}
l112:
{
position113, tokenIndex113 := position, tokenIndex
{
position114, tokenIndex114 := position, tokenIndex
if !_rules[ruleWS]() {
goto l114
}
goto l115
l114:
position, tokenIndex = position114, tokenIndex114
}
l115:
if buffer[position] != rune(',') {
goto l113
}
position++
{
position116, tokenIndex116 := position, tokenIndex
if !_rules[ruleWS]() {
goto l116
}
goto l117
l116:
position, tokenIndex = position116, tokenIndex116
}
l117:
if !_rules[ruleArg]() {
goto l113
}
goto l112
l113:
position, tokenIndex = position113, tokenIndex113
}
add(ruleArgs, position111)
}
return true
l110:
position, tokenIndex = position110, tokenIndex110
return false
},
/* 9 Arg <- <(QuotedArg / ([0-9] / [0-9] / ([a-z] / [A-Z]) / '%' / '+' / '-' / '*' / '_' / '@' / '.')*)> */
func() bool {
{
position119 := position
{
position120, tokenIndex120 := position, tokenIndex
if !_rules[ruleQuotedArg]() {
goto l121
}
goto l120
l121:
position, tokenIndex = position120, tokenIndex120
l122:
{
position123, tokenIndex123 := position, tokenIndex
{
position124, tokenIndex124 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l125
}
position++
goto l124
l125:
position, tokenIndex = position124, tokenIndex124
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l126
}
position++
goto l124
l126:
position, tokenIndex = position124, tokenIndex124
{
position128, tokenIndex128 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l129
}
position++
goto l128
l129:
position, tokenIndex = position128, tokenIndex128
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l127
}
position++
}
l128:
goto l124
l127:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('%') {
goto l130
}
position++
goto l124
l130:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('+') {
goto l131
}
position++
goto l124
l131:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('-') {
goto l132
}
position++
goto l124
l132:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('*') {
goto l133
}
position++
goto l124
l133:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('_') {
goto l134
}
position++
goto l124
l134:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('@') {
goto l135
}
position++
goto l124
l135:
position, tokenIndex = position124, tokenIndex124
if buffer[position] != rune('.') {
goto l123
}
position++
}
l124:
goto l122
l123:
position, tokenIndex = position123, tokenIndex123
}
}
l120:
add(ruleArg, position119)
}
return true
},
/* 10 QuotedArg <- <('"' QuotedText '"')> */
func() bool {
position136, tokenIndex136 := position, tokenIndex
{
position137 := position
if buffer[position] != rune('"') {
goto l136
}
position++
if !_rules[ruleQuotedText]() {
goto l136
}
if buffer[position] != rune('"') {
goto l136
}
position++
add(ruleQuotedArg, position137)
}
return true
l136:
position, tokenIndex = position136, tokenIndex136
return false
},
/* 11 QuotedText <- <(EscapedChar / (!'"' .))*> */
func() bool {
{
position139 := position
l140:
{
position141, tokenIndex141 := position, tokenIndex
{
position142, tokenIndex142 := position, tokenIndex
if !_rules[ruleEscapedChar]() {
goto l143
}
goto l142
l143:
position, tokenIndex = position142, tokenIndex142
{
position144, tokenIndex144 := position, tokenIndex
if buffer[position] != rune('"') {
goto l144
}
position++
goto l141
l144:
position, tokenIndex = position144, tokenIndex144
}
if !matchDot() {
goto l141
}
}
l142:
goto l140
l141:
position, tokenIndex = position141, tokenIndex141
}
add(ruleQuotedText, position139)
}
return true
},
/* 12 LabelContainingDirective <- <(LabelContainingDirectiveName WS SymbolArgs)> */
func() bool {
position145, tokenIndex145 := position, tokenIndex
{
position146 := position
if !_rules[ruleLabelContainingDirectiveName]() {
goto l145
}
if !_rules[ruleWS]() {
goto l145
}
if !_rules[ruleSymbolArgs]() {
goto l145
}
add(ruleLabelContainingDirective, position146)
}
return true
l145:
position, tokenIndex = position145, tokenIndex145
return false
},
/* 13 LabelContainingDirectiveName <- <(('.' ('x' / 'X') ('w' / 'W') ('o' / 'O') ('r' / 'R') ('d' / 'D')) / ('.' ('w' / 'W') ('o' / 'O') ('r' / 'R') ('d' / 'D')) / ('.' ('l' / 'L') ('o' / 'O') ('n' / 'N') ('g' / 'G')) / ('.' ('s' / 'S') ('e' / 'E') ('t' / 'T')) / ('.' ('b' / 'B') ('y' / 'Y') ('t' / 'T') ('e' / 'E')) / ('.' '8' ('b' / 'B') ('y' / 'Y') ('t' / 'T') ('e' / 'E')) / ('.' '4' ('b' / 'B') ('y' / 'Y') ('t' / 'T') ('e' / 'E')) / ('.' ('q' / 'Q') ('u' / 'U') ('a' / 'A') ('d' / 'D')) / ('.' ('t' / 'T') ('c' / 'C')) / ('.' ('l' / 'L') ('o' / 'O') ('c' / 'C') ('a' / 'A') ('l' / 'L') ('e' / 'E') ('n' / 'N') ('t' / 'T') ('r' / 'R') ('y' / 'Y')) / ('.' ('s' / 'S') ('i' / 'I') ('z' / 'Z') ('e' / 'E')) / ('.' ('t' / 'T') ('y' / 'Y') ('p' / 'P') ('e' / 'E')) / ('.' ('u' / 'U') ('l' / 'L') ('e' / 'E') ('b' / 'B') '1' '2' '8') / ('.' ('s' / 'S') ('l' / 'L') ('e' / 'E') ('b' / 'B') '1' '2' '8'))> */
func() bool {
position147, tokenIndex147 := position, tokenIndex
{
position148 := position
{
position149, tokenIndex149 := position, tokenIndex
if buffer[position] != rune('.') {
goto l150
}
position++
{
position151, tokenIndex151 := position, tokenIndex
if buffer[position] != rune('x') {
goto l152
}
position++
goto l151
l152:
position, tokenIndex = position151, tokenIndex151
if buffer[position] != rune('X') {
goto l150
}
position++
}
l151:
{
position153, tokenIndex153 := position, tokenIndex
if buffer[position] != rune('w') {
goto l154
}
position++
goto l153
l154:
position, tokenIndex = position153, tokenIndex153
if buffer[position] != rune('W') {
goto l150
}
position++
}
l153:
{
position155, tokenIndex155 := position, tokenIndex
if buffer[position] != rune('o') {
goto l156
}
position++
goto l155
l156:
position, tokenIndex = position155, tokenIndex155
if buffer[position] != rune('O') {
goto l150
}
position++
}
l155:
{
position157, tokenIndex157 := position, tokenIndex
if buffer[position] != rune('r') {
goto l158
}
position++
goto l157
l158:
position, tokenIndex = position157, tokenIndex157
if buffer[position] != rune('R') {
goto l150
}
position++
}
l157:
{
position159, tokenIndex159 := position, tokenIndex
if buffer[position] != rune('d') {
goto l160
}
position++
goto l159
l160:
position, tokenIndex = position159, tokenIndex159
if buffer[position] != rune('D') {
goto l150
}
position++
}
l159:
goto l149
l150:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l161
}
position++
{
position162, tokenIndex162 := position, tokenIndex
if buffer[position] != rune('w') {
goto l163
}
position++
goto l162
l163:
position, tokenIndex = position162, tokenIndex162
if buffer[position] != rune('W') {
goto l161
}
position++
}
l162:
{
position164, tokenIndex164 := position, tokenIndex
if buffer[position] != rune('o') {
goto l165
}
position++
goto l164
l165:
position, tokenIndex = position164, tokenIndex164
if buffer[position] != rune('O') {
goto l161
}
position++
}
l164:
{
position166, tokenIndex166 := position, tokenIndex
if buffer[position] != rune('r') {
goto l167
}
position++
goto l166
l167:
position, tokenIndex = position166, tokenIndex166
if buffer[position] != rune('R') {
goto l161
}
position++
}
l166:
{
position168, tokenIndex168 := position, tokenIndex
if buffer[position] != rune('d') {
goto l169
}
position++
goto l168
l169:
position, tokenIndex = position168, tokenIndex168
if buffer[position] != rune('D') {
goto l161
}
position++
}
l168:
goto l149
l161:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l170
}
position++
{
position171, tokenIndex171 := position, tokenIndex
if buffer[position] != rune('l') {
goto l172
}
position++
goto l171
l172:
position, tokenIndex = position171, tokenIndex171
if buffer[position] != rune('L') {
goto l170
}
position++
}
l171:
{
position173, tokenIndex173 := position, tokenIndex
if buffer[position] != rune('o') {
goto l174
}
position++
goto l173
l174:
position, tokenIndex = position173, tokenIndex173
if buffer[position] != rune('O') {
goto l170
}
position++
}
l173:
{
position175, tokenIndex175 := position, tokenIndex
if buffer[position] != rune('n') {
goto l176
}
position++
goto l175
l176:
position, tokenIndex = position175, tokenIndex175
if buffer[position] != rune('N') {
goto l170
}
position++
}
l175:
{
position177, tokenIndex177 := position, tokenIndex
if buffer[position] != rune('g') {
goto l178
}
position++
goto l177
l178:
position, tokenIndex = position177, tokenIndex177
if buffer[position] != rune('G') {
goto l170
}
position++
}
l177:
goto l149
l170:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l179
}
position++
{
position180, tokenIndex180 := position, tokenIndex
if buffer[position] != rune('s') {
goto l181
}
position++
goto l180
l181:
position, tokenIndex = position180, tokenIndex180
if buffer[position] != rune('S') {
goto l179
}
position++
}
l180:
{
position182, tokenIndex182 := position, tokenIndex
if buffer[position] != rune('e') {
goto l183
}
position++
goto l182
l183:
position, tokenIndex = position182, tokenIndex182
if buffer[position] != rune('E') {
goto l179
}
position++
}
l182:
{
position184, tokenIndex184 := position, tokenIndex
if buffer[position] != rune('t') {
goto l185
}
position++
goto l184
l185:
position, tokenIndex = position184, tokenIndex184
if buffer[position] != rune('T') {
goto l179
}
position++
}
l184:
goto l149
l179:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l186
}
position++
{
position187, tokenIndex187 := position, tokenIndex
if buffer[position] != rune('b') {
goto l188
}
position++
goto l187
l188:
position, tokenIndex = position187, tokenIndex187
if buffer[position] != rune('B') {
goto l186
}
position++
}
l187:
{
position189, tokenIndex189 := position, tokenIndex
if buffer[position] != rune('y') {
goto l190
}
position++
goto l189
l190:
position, tokenIndex = position189, tokenIndex189
if buffer[position] != rune('Y') {
goto l186
}
position++
}
l189:
{
position191, tokenIndex191 := position, tokenIndex
if buffer[position] != rune('t') {
goto l192
}
position++
goto l191
l192:
position, tokenIndex = position191, tokenIndex191
if buffer[position] != rune('T') {
goto l186
}
position++
}
l191:
{
position193, tokenIndex193 := position, tokenIndex
if buffer[position] != rune('e') {
goto l194
}
position++
goto l193
l194:
position, tokenIndex = position193, tokenIndex193
if buffer[position] != rune('E') {
goto l186
}
position++
}
l193:
goto l149
l186:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l195
}
position++
if buffer[position] != rune('8') {
goto l195
}
position++
{
position196, tokenIndex196 := position, tokenIndex
if buffer[position] != rune('b') {
goto l197
}
position++
goto l196
l197:
position, tokenIndex = position196, tokenIndex196
if buffer[position] != rune('B') {
goto l195
}
position++
}
l196:
{
position198, tokenIndex198 := position, tokenIndex
if buffer[position] != rune('y') {
goto l199
}
position++
goto l198
l199:
position, tokenIndex = position198, tokenIndex198
if buffer[position] != rune('Y') {
goto l195
}
position++
}
l198:
{
position200, tokenIndex200 := position, tokenIndex
if buffer[position] != rune('t') {
goto l201
}
position++
goto l200
l201:
position, tokenIndex = position200, tokenIndex200
if buffer[position] != rune('T') {
goto l195
}
position++
}
l200:
{
position202, tokenIndex202 := position, tokenIndex
if buffer[position] != rune('e') {
goto l203
}
position++
goto l202
l203:
position, tokenIndex = position202, tokenIndex202
if buffer[position] != rune('E') {
goto l195
}
position++
}
l202:
goto l149
l195:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l204
}
position++
if buffer[position] != rune('4') {
goto l204
}
position++
{
position205, tokenIndex205 := position, tokenIndex
if buffer[position] != rune('b') {
goto l206
}
position++
goto l205
l206:
position, tokenIndex = position205, tokenIndex205
if buffer[position] != rune('B') {
goto l204
}
position++
}
l205:
{
position207, tokenIndex207 := position, tokenIndex
if buffer[position] != rune('y') {
goto l208
}
position++
goto l207
l208:
position, tokenIndex = position207, tokenIndex207
if buffer[position] != rune('Y') {
goto l204
}
position++
}
l207:
{
position209, tokenIndex209 := position, tokenIndex
if buffer[position] != rune('t') {
goto l210
}
position++
goto l209
l210:
position, tokenIndex = position209, tokenIndex209
if buffer[position] != rune('T') {
goto l204
}
position++
}
l209:
{
position211, tokenIndex211 := position, tokenIndex
if buffer[position] != rune('e') {
goto l212
}
position++
goto l211
l212:
position, tokenIndex = position211, tokenIndex211
if buffer[position] != rune('E') {
goto l204
}
position++
}
l211:
goto l149
l204:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l213
}
position++
{
position214, tokenIndex214 := position, tokenIndex
if buffer[position] != rune('q') {
goto l215
}
position++
goto l214
l215:
position, tokenIndex = position214, tokenIndex214
if buffer[position] != rune('Q') {
goto l213
}
position++
}
l214:
{
position216, tokenIndex216 := position, tokenIndex
if buffer[position] != rune('u') {
goto l217
}
position++
goto l216
l217:
position, tokenIndex = position216, tokenIndex216
if buffer[position] != rune('U') {
goto l213
}
position++
}
l216:
{
position218, tokenIndex218 := position, tokenIndex
if buffer[position] != rune('a') {
goto l219
}
position++
goto l218
l219:
position, tokenIndex = position218, tokenIndex218
if buffer[position] != rune('A') {
goto l213
}
position++
}
l218:
{
position220, tokenIndex220 := position, tokenIndex
if buffer[position] != rune('d') {
goto l221
}
position++
goto l220
l221:
position, tokenIndex = position220, tokenIndex220
if buffer[position] != rune('D') {
goto l213
}
position++
}
l220:
goto l149
l213:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l222
}
position++
{
position223, tokenIndex223 := position, tokenIndex
if buffer[position] != rune('t') {
goto l224
}
position++
goto l223
l224:
position, tokenIndex = position223, tokenIndex223
if buffer[position] != rune('T') {
goto l222
}
position++
}
l223:
{
position225, tokenIndex225 := position, tokenIndex
if buffer[position] != rune('c') {
goto l226
}
position++
goto l225
l226:
position, tokenIndex = position225, tokenIndex225
if buffer[position] != rune('C') {
goto l222
}
position++
}
l225:
goto l149
l222:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l227
}
position++
{
position228, tokenIndex228 := position, tokenIndex
if buffer[position] != rune('l') {
goto l229
}
position++
goto l228
l229:
position, tokenIndex = position228, tokenIndex228
if buffer[position] != rune('L') {
goto l227
}
position++
}
l228:
{
position230, tokenIndex230 := position, tokenIndex
if buffer[position] != rune('o') {
goto l231
}
position++
goto l230
l231:
position, tokenIndex = position230, tokenIndex230
if buffer[position] != rune('O') {
goto l227
}
position++
}
l230:
{
position232, tokenIndex232 := position, tokenIndex
if buffer[position] != rune('c') {
goto l233
}
position++
goto l232
l233:
position, tokenIndex = position232, tokenIndex232
if buffer[position] != rune('C') {
goto l227
}
position++
}
l232:
{
position234, tokenIndex234 := position, tokenIndex
if buffer[position] != rune('a') {
goto l235
}
position++
goto l234
l235:
position, tokenIndex = position234, tokenIndex234
if buffer[position] != rune('A') {
goto l227
}
position++
}
l234:
{
position236, tokenIndex236 := position, tokenIndex
if buffer[position] != rune('l') {
goto l237
}
position++
goto l236
l237:
position, tokenIndex = position236, tokenIndex236
if buffer[position] != rune('L') {
goto l227
}
position++
}
l236:
{
position238, tokenIndex238 := position, tokenIndex
if buffer[position] != rune('e') {
goto l239
}
position++
goto l238
l239:
position, tokenIndex = position238, tokenIndex238
if buffer[position] != rune('E') {
goto l227
}
position++
}
l238:
{
position240, tokenIndex240 := position, tokenIndex
if buffer[position] != rune('n') {
goto l241
}
position++
goto l240
l241:
position, tokenIndex = position240, tokenIndex240
if buffer[position] != rune('N') {
goto l227
}
position++
}
l240:
{
position242, tokenIndex242 := position, tokenIndex
if buffer[position] != rune('t') {
goto l243
}
position++
goto l242
l243:
position, tokenIndex = position242, tokenIndex242
if buffer[position] != rune('T') {
goto l227
}
position++
}
l242:
{
position244, tokenIndex244 := position, tokenIndex
if buffer[position] != rune('r') {
goto l245
}
position++
goto l244
l245:
position, tokenIndex = position244, tokenIndex244
if buffer[position] != rune('R') {
goto l227
}
position++
}
l244:
{
position246, tokenIndex246 := position, tokenIndex
if buffer[position] != rune('y') {
goto l247
}
position++
goto l246
l247:
position, tokenIndex = position246, tokenIndex246
if buffer[position] != rune('Y') {
goto l227
}
position++
}
l246:
goto l149
l227:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l248
}
position++
{
position249, tokenIndex249 := position, tokenIndex
if buffer[position] != rune('s') {
goto l250
}
position++
goto l249
l250:
position, tokenIndex = position249, tokenIndex249
if buffer[position] != rune('S') {
goto l248
}
position++
}
l249:
{
position251, tokenIndex251 := position, tokenIndex
if buffer[position] != rune('i') {
goto l252
}
position++
goto l251
l252:
position, tokenIndex = position251, tokenIndex251
if buffer[position] != rune('I') {
goto l248
}
position++
}
l251:
{
position253, tokenIndex253 := position, tokenIndex
if buffer[position] != rune('z') {
goto l254
}
position++
goto l253
l254:
position, tokenIndex = position253, tokenIndex253
if buffer[position] != rune('Z') {
goto l248
}
position++
}
l253:
{
position255, tokenIndex255 := position, tokenIndex
if buffer[position] != rune('e') {
goto l256
}
position++
goto l255
l256:
position, tokenIndex = position255, tokenIndex255
if buffer[position] != rune('E') {
goto l248
}
position++
}
l255:
goto l149
l248:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l257
}
position++
{
position258, tokenIndex258 := position, tokenIndex
if buffer[position] != rune('t') {
goto l259
}
position++
goto l258
l259:
position, tokenIndex = position258, tokenIndex258
if buffer[position] != rune('T') {
goto l257
}
position++
}
l258:
{
position260, tokenIndex260 := position, tokenIndex
if buffer[position] != rune('y') {
goto l261
}
position++
goto l260
l261:
position, tokenIndex = position260, tokenIndex260
if buffer[position] != rune('Y') {
goto l257
}
position++
}
l260:
{
position262, tokenIndex262 := position, tokenIndex
if buffer[position] != rune('p') {
goto l263
}
position++
goto l262
l263:
position, tokenIndex = position262, tokenIndex262
if buffer[position] != rune('P') {
goto l257
}
position++
}
l262:
{
position264, tokenIndex264 := position, tokenIndex
if buffer[position] != rune('e') {
goto l265
}
position++
goto l264
l265:
position, tokenIndex = position264, tokenIndex264
if buffer[position] != rune('E') {
goto l257
}
position++
}
l264:
goto l149
l257:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l266
}
position++
{
position267, tokenIndex267 := position, tokenIndex
if buffer[position] != rune('u') {
goto l268
}
position++
goto l267
l268:
position, tokenIndex = position267, tokenIndex267
if buffer[position] != rune('U') {
goto l266
}
position++
}
l267:
{
position269, tokenIndex269 := position, tokenIndex
if buffer[position] != rune('l') {
goto l270
}
position++
goto l269
l270:
position, tokenIndex = position269, tokenIndex269
if buffer[position] != rune('L') {
goto l266
}
position++
}
l269:
{
position271, tokenIndex271 := position, tokenIndex
if buffer[position] != rune('e') {
goto l272
}
position++
goto l271
l272:
position, tokenIndex = position271, tokenIndex271
if buffer[position] != rune('E') {
goto l266
}
position++
}
l271:
{
position273, tokenIndex273 := position, tokenIndex
if buffer[position] != rune('b') {
goto l274
}
position++
goto l273
l274:
position, tokenIndex = position273, tokenIndex273
if buffer[position] != rune('B') {
goto l266
}
position++
}
l273:
if buffer[position] != rune('1') {
goto l266
}
position++
if buffer[position] != rune('2') {
goto l266
}
position++
if buffer[position] != rune('8') {
goto l266
}
position++
goto l149
l266:
position, tokenIndex = position149, tokenIndex149
if buffer[position] != rune('.') {
goto l147
}
position++
{
position275, tokenIndex275 := position, tokenIndex
if buffer[position] != rune('s') {
goto l276
}
position++
goto l275
l276:
position, tokenIndex = position275, tokenIndex275
if buffer[position] != rune('S') {
goto l147
}
position++
}
l275:
{
position277, tokenIndex277 := position, tokenIndex
if buffer[position] != rune('l') {
goto l278
}
position++
goto l277
l278:
position, tokenIndex = position277, tokenIndex277
if buffer[position] != rune('L') {
goto l147
}
position++
}
l277:
{
position279, tokenIndex279 := position, tokenIndex
if buffer[position] != rune('e') {
goto l280
}
position++
goto l279
l280:
position, tokenIndex = position279, tokenIndex279
if buffer[position] != rune('E') {
goto l147
}
position++
}
l279:
{
position281, tokenIndex281 := position, tokenIndex
if buffer[position] != rune('b') {
goto l282
}
position++
goto l281
l282:
position, tokenIndex = position281, tokenIndex281
if buffer[position] != rune('B') {
goto l147
}
position++
}
l281:
if buffer[position] != rune('1') {
goto l147
}
position++
if buffer[position] != rune('2') {
goto l147
}
position++
if buffer[position] != rune('8') {
goto l147
}
position++
}
l149:
add(ruleLabelContainingDirectiveName, position148)
}
return true
l147:
position, tokenIndex = position147, tokenIndex147
return false
},
/* 14 SymbolArgs <- <(SymbolArg (WS? ',' WS? SymbolArg)*)> */
func() bool {
position283, tokenIndex283 := position, tokenIndex
{
position284 := position
if !_rules[ruleSymbolArg]() {
goto l283
}
l285:
{
position286, tokenIndex286 := position, tokenIndex
{
position287, tokenIndex287 := position, tokenIndex
if !_rules[ruleWS]() {
goto l287
}
goto l288
l287:
position, tokenIndex = position287, tokenIndex287
}
l288:
if buffer[position] != rune(',') {
goto l286
}
position++
{
position289, tokenIndex289 := position, tokenIndex
if !_rules[ruleWS]() {
goto l289
}
goto l290
l289:
position, tokenIndex = position289, tokenIndex289
}
l290:
if !_rules[ruleSymbolArg]() {
goto l286
}
goto l285
l286:
position, tokenIndex = position286, tokenIndex286
}
add(ruleSymbolArgs, position284)
}
return true
l283:
position, tokenIndex = position283, tokenIndex283
return false
},
/* 15 SymbolShift <- <((('<' '<') / ('>' '>')) WS? [0-9]+)> */
func() bool {
position291, tokenIndex291 := position, tokenIndex
{
position292 := position
{
position293, tokenIndex293 := position, tokenIndex
if buffer[position] != rune('<') {
goto l294
}
position++
if buffer[position] != rune('<') {
goto l294
}
position++
goto l293
l294:
position, tokenIndex = position293, tokenIndex293
if buffer[position] != rune('>') {
goto l291
}
position++
if buffer[position] != rune('>') {
goto l291
}
position++
}
l293:
{
position295, tokenIndex295 := position, tokenIndex
if !_rules[ruleWS]() {
goto l295
}
goto l296
l295:
position, tokenIndex = position295, tokenIndex295
}
l296:
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l291
}
position++
l297:
{
position298, tokenIndex298 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l298
}
position++
goto l297
l298:
position, tokenIndex = position298, tokenIndex298
}
add(ruleSymbolShift, position292)
}
return true
l291:
position, tokenIndex = position291, tokenIndex291
return false
},
/* 16 SymbolArg <- <((OpenParen WS?)? (Offset / SymbolType / ((Offset / LocalSymbol / SymbolName / Dot) (WS? Operator WS? (Offset / LocalSymbol / SymbolName))*) / (LocalSymbol TCMarker?) / (SymbolName Offset) / (SymbolName TCMarker?)) (WS? CloseParen)? (WS? SymbolShift)?)> */
func() bool {
position299, tokenIndex299 := position, tokenIndex
{
position300 := position
{
position301, tokenIndex301 := position, tokenIndex
if !_rules[ruleOpenParen]() {
goto l301
}
{
position303, tokenIndex303 := position, tokenIndex
if !_rules[ruleWS]() {
goto l303
}
goto l304
l303:
position, tokenIndex = position303, tokenIndex303
}
l304:
goto l302
l301:
position, tokenIndex = position301, tokenIndex301
}
l302:
{
position305, tokenIndex305 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l306
}
goto l305
l306:
position, tokenIndex = position305, tokenIndex305
if !_rules[ruleSymbolType]() {
goto l307
}
goto l305
l307:
position, tokenIndex = position305, tokenIndex305
{
position309, tokenIndex309 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l310
}
goto l309
l310:
position, tokenIndex = position309, tokenIndex309
if !_rules[ruleLocalSymbol]() {
goto l311
}
goto l309
l311:
position, tokenIndex = position309, tokenIndex309
if !_rules[ruleSymbolName]() {
goto l312
}
goto l309
l312:
position, tokenIndex = position309, tokenIndex309
if !_rules[ruleDot]() {
goto l308
}
}
l309:
l313:
{
position314, tokenIndex314 := position, tokenIndex
{
position315, tokenIndex315 := position, tokenIndex
if !_rules[ruleWS]() {
goto l315
}
goto l316
l315:
position, tokenIndex = position315, tokenIndex315
}
l316:
if !_rules[ruleOperator]() {
goto l314
}
{
position317, tokenIndex317 := position, tokenIndex
if !_rules[ruleWS]() {
goto l317
}
goto l318
l317:
position, tokenIndex = position317, tokenIndex317
}
l318:
{
position319, tokenIndex319 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l320
}
goto l319
l320:
position, tokenIndex = position319, tokenIndex319
if !_rules[ruleLocalSymbol]() {
goto l321
}
goto l319
l321:
position, tokenIndex = position319, tokenIndex319
if !_rules[ruleSymbolName]() {
goto l314
}
}
l319:
goto l313
l314:
position, tokenIndex = position314, tokenIndex314
}
goto l305
l308:
position, tokenIndex = position305, tokenIndex305
if !_rules[ruleLocalSymbol]() {
goto l322
}
{
position323, tokenIndex323 := position, tokenIndex
if !_rules[ruleTCMarker]() {
goto l323
}
goto l324
l323:
position, tokenIndex = position323, tokenIndex323
}
l324:
goto l305
l322:
position, tokenIndex = position305, tokenIndex305
if !_rules[ruleSymbolName]() {
goto l325
}
if !_rules[ruleOffset]() {
goto l325
}
goto l305
l325:
position, tokenIndex = position305, tokenIndex305
if !_rules[ruleSymbolName]() {
goto l299
}
{
position326, tokenIndex326 := position, tokenIndex
if !_rules[ruleTCMarker]() {
goto l326
}
goto l327
l326:
position, tokenIndex = position326, tokenIndex326
}
l327:
}
l305:
{
position328, tokenIndex328 := position, tokenIndex
{
position330, tokenIndex330 := position, tokenIndex
if !_rules[ruleWS]() {
goto l330
}
goto l331
l330:
position, tokenIndex = position330, tokenIndex330
}
l331:
if !_rules[ruleCloseParen]() {
goto l328
}
goto l329
l328:
position, tokenIndex = position328, tokenIndex328
}
l329:
{
position332, tokenIndex332 := position, tokenIndex
{
position334, tokenIndex334 := position, tokenIndex
if !_rules[ruleWS]() {
goto l334
}
goto l335
l334:
position, tokenIndex = position334, tokenIndex334
}
l335:
if !_rules[ruleSymbolShift]() {
goto l332
}
goto l333
l332:
position, tokenIndex = position332, tokenIndex332
}
l333:
add(ruleSymbolArg, position300)
}
return true
l299:
position, tokenIndex = position299, tokenIndex299
return false
},
/* 17 OpenParen <- <'('> */
func() bool {
position336, tokenIndex336 := position, tokenIndex
{
position337 := position
if buffer[position] != rune('(') {
goto l336
}
position++
add(ruleOpenParen, position337)
}
return true
l336:
position, tokenIndex = position336, tokenIndex336
return false
},
/* 18 CloseParen <- <')'> */
func() bool {
position338, tokenIndex338 := position, tokenIndex
{
position339 := position
if buffer[position] != rune(')') {
goto l338
}
position++
add(ruleCloseParen, position339)
}
return true
l338:
position, tokenIndex = position338, tokenIndex338
return false
},
/* 19 SymbolType <- <(('@' / '%') (('f' 'u' 'n' 'c' 't' 'i' 'o' 'n') / ('o' 'b' 'j' 'e' 'c' 't')))> */
func() bool {
position340, tokenIndex340 := position, tokenIndex
{
position341 := position
{
position342, tokenIndex342 := position, tokenIndex
if buffer[position] != rune('@') {
goto l343
}
position++
goto l342
l343:
position, tokenIndex = position342, tokenIndex342
if buffer[position] != rune('%') {
goto l340
}
position++
}
l342:
{
position344, tokenIndex344 := position, tokenIndex
if buffer[position] != rune('f') {
goto l345
}
position++
if buffer[position] != rune('u') {
goto l345
}
position++
if buffer[position] != rune('n') {
goto l345
}
position++
if buffer[position] != rune('c') {
goto l345
}
position++
if buffer[position] != rune('t') {
goto l345
}
position++
if buffer[position] != rune('i') {
goto l345
}
position++
if buffer[position] != rune('o') {
goto l345
}
position++
if buffer[position] != rune('n') {
goto l345
}
position++
goto l344
l345:
position, tokenIndex = position344, tokenIndex344
if buffer[position] != rune('o') {
goto l340
}
position++
if buffer[position] != rune('b') {
goto l340
}
position++
if buffer[position] != rune('j') {
goto l340
}
position++
if buffer[position] != rune('e') {
goto l340
}
position++
if buffer[position] != rune('c') {
goto l340
}
position++
if buffer[position] != rune('t') {
goto l340
}
position++
}
l344:
add(ruleSymbolType, position341)
}
return true
l340:
position, tokenIndex = position340, tokenIndex340
return false
},
/* 20 Dot <- <'.'> */
func() bool {
position346, tokenIndex346 := position, tokenIndex
{
position347 := position
if buffer[position] != rune('.') {
goto l346
}
position++
add(ruleDot, position347)
}
return true
l346:
position, tokenIndex = position346, tokenIndex346
return false
},
/* 21 TCMarker <- <('[' 'T' 'C' ']')> */
func() bool {
position348, tokenIndex348 := position, tokenIndex
{
position349 := position
if buffer[position] != rune('[') {
goto l348
}
position++
if buffer[position] != rune('T') {
goto l348
}
position++
if buffer[position] != rune('C') {
goto l348
}
position++
if buffer[position] != rune(']') {
goto l348
}
position++
add(ruleTCMarker, position349)
}
return true
l348:
position, tokenIndex = position348, tokenIndex348
return false
},
/* 22 EscapedChar <- <('\\' .)> */
func() bool {
position350, tokenIndex350 := position, tokenIndex
{
position351 := position
if buffer[position] != rune('\\') {
goto l350
}
position++
if !matchDot() {
goto l350
}
add(ruleEscapedChar, position351)
}
return true
l350:
position, tokenIndex = position350, tokenIndex350
return false
},
/* 23 WS <- <(' ' / '\t')+> */
func() bool {
position352, tokenIndex352 := position, tokenIndex
{
position353 := position
{
position356, tokenIndex356 := position, tokenIndex
if buffer[position] != rune(' ') {
goto l357
}
position++
goto l356
l357:
position, tokenIndex = position356, tokenIndex356
if buffer[position] != rune('\t') {
goto l352
}
position++
}
l356:
l354:
{
position355, tokenIndex355 := position, tokenIndex
{
position358, tokenIndex358 := position, tokenIndex
if buffer[position] != rune(' ') {
goto l359
}
position++
goto l358
l359:
position, tokenIndex = position358, tokenIndex358
if buffer[position] != rune('\t') {
goto l355
}
position++
}
l358:
goto l354
l355:
position, tokenIndex = position355, tokenIndex355
}
add(ruleWS, position353)
}
return true
l352:
position, tokenIndex = position352, tokenIndex352
return false
},
/* 24 Comment <- <((('/' '/') / '#') (!'\n' .)*)> */
func() bool {
position360, tokenIndex360 := position, tokenIndex
{
position361 := position
{
position362, tokenIndex362 := position, tokenIndex
if buffer[position] != rune('/') {
goto l363
}
position++
if buffer[position] != rune('/') {
goto l363
}
position++
goto l362
l363:
position, tokenIndex = position362, tokenIndex362
if buffer[position] != rune('#') {
goto l360
}
position++
}
l362:
l364:
{
position365, tokenIndex365 := position, tokenIndex
{
position366, tokenIndex366 := position, tokenIndex
if buffer[position] != rune('\n') {
goto l366
}
position++
goto l365
l366:
position, tokenIndex = position366, tokenIndex366
}
if !matchDot() {
goto l365
}
goto l364
l365:
position, tokenIndex = position365, tokenIndex365
}
add(ruleComment, position361)
}
return true
l360:
position, tokenIndex = position360, tokenIndex360
return false
},
/* 25 Label <- <((LocalSymbol / LocalLabel / SymbolName) ':')> */
func() bool {
position367, tokenIndex367 := position, tokenIndex
{
position368 := position
{
position369, tokenIndex369 := position, tokenIndex
if !_rules[ruleLocalSymbol]() {
goto l370
}
goto l369
l370:
position, tokenIndex = position369, tokenIndex369
if !_rules[ruleLocalLabel]() {
goto l371
}
goto l369
l371:
position, tokenIndex = position369, tokenIndex369
if !_rules[ruleSymbolName]() {
goto l367
}
}
l369:
if buffer[position] != rune(':') {
goto l367
}
position++
add(ruleLabel, position368)
}
return true
l367:
position, tokenIndex = position367, tokenIndex367
return false
},
/* 26 SymbolName <- <(([a-z] / [A-Z] / '.' / '_') ([a-z] / [A-Z] / '.' / ([0-9] / [0-9]) / '$' / '_')*)> */
func() bool {
position372, tokenIndex372 := position, tokenIndex
{
position373 := position
{
position374, tokenIndex374 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l375
}
position++
goto l374
l375:
position, tokenIndex = position374, tokenIndex374
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l376
}
position++
goto l374
l376:
position, tokenIndex = position374, tokenIndex374
if buffer[position] != rune('.') {
goto l377
}
position++
goto l374
l377:
position, tokenIndex = position374, tokenIndex374
if buffer[position] != rune('_') {
goto l372
}
position++
}
l374:
l378:
{
position379, tokenIndex379 := position, tokenIndex
{
position380, tokenIndex380 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l381
}
position++
goto l380
l381:
position, tokenIndex = position380, tokenIndex380
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l382
}
position++
goto l380
l382:
position, tokenIndex = position380, tokenIndex380
if buffer[position] != rune('.') {
goto l383
}
position++
goto l380
l383:
position, tokenIndex = position380, tokenIndex380
{
position385, tokenIndex385 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l386
}
position++
goto l385
l386:
position, tokenIndex = position385, tokenIndex385
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l384
}
position++
}
l385:
goto l380
l384:
position, tokenIndex = position380, tokenIndex380
if buffer[position] != rune('$') {
goto l387
}
position++
goto l380
l387:
position, tokenIndex = position380, tokenIndex380
if buffer[position] != rune('_') {
goto l379
}
position++
}
l380:
goto l378
l379:
position, tokenIndex = position379, tokenIndex379
}
add(ruleSymbolName, position373)
}
return true
l372:
position, tokenIndex = position372, tokenIndex372
return false
},
/* 27 LocalSymbol <- <('.' 'L' ([a-z] / [A-Z] / ([a-z] / [A-Z]) / '.' / ([0-9] / [0-9]) / '$' / '_')+)> */
func() bool {
position388, tokenIndex388 := position, tokenIndex
{
position389 := position
if buffer[position] != rune('.') {
goto l388
}
position++
if buffer[position] != rune('L') {
goto l388
}
position++
{
position392, tokenIndex392 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l393
}
position++
goto l392
l393:
position, tokenIndex = position392, tokenIndex392
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l394
}
position++
goto l392
l394:
position, tokenIndex = position392, tokenIndex392
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
{
position396, tokenIndex396 := position, tokenIndex
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l397
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
goto l396
l397:
position, tokenIndex = position396, tokenIndex396
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l395
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
}
l396:
goto l392
l395:
position, tokenIndex = position392, tokenIndex392
if buffer[position] != rune('.') {
goto l398
}
position++
goto l392
l398:
position, tokenIndex = position392, tokenIndex392
{
position400, tokenIndex400 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l401
}
position++
goto l400
l401:
position, tokenIndex = position400, tokenIndex400
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l399
}
position++
}
l400:
goto l392
l399:
position, tokenIndex = position392, tokenIndex392
if buffer[position] != rune('$') {
goto l402
}
position++
goto l392
l402:
position, tokenIndex = position392, tokenIndex392
if buffer[position] != rune('_') {
goto l388
}
position++
}
l392:
l390:
{
position391, tokenIndex391 := position, tokenIndex
{
position403, tokenIndex403 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l404
}
position++
goto l403
l404:
position, tokenIndex = position403, tokenIndex403
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l405
}
position++
goto l403
l405:
position, tokenIndex = position403, tokenIndex403
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
{
position407, tokenIndex407 := position, tokenIndex
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l408
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
goto l407
l408:
position, tokenIndex = position407, tokenIndex407
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l406
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
}
l407:
goto l403
l406:
position, tokenIndex = position403, tokenIndex403
if buffer[position] != rune('.') {
goto l409
}
position++
goto l403
l409:
position, tokenIndex = position403, tokenIndex403
{
position411, tokenIndex411 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l412
}
position++
goto l411
l412:
position, tokenIndex = position411, tokenIndex411
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l410
}
position++
}
l411:
goto l403
l410:
position, tokenIndex = position403, tokenIndex403
if buffer[position] != rune('$') {
goto l413
}
position++
goto l403
l413:
position, tokenIndex = position403, tokenIndex403
if buffer[position] != rune('_') {
goto l391
}
position++
}
l403:
goto l390
l391:
position, tokenIndex = position391, tokenIndex391
}
add(ruleLocalSymbol, position389)
}
return true
l388:
position, tokenIndex = position388, tokenIndex388
return false
},
/* 28 LocalLabel <- <([0-9] ([0-9] / '$')*)> */
func() bool {
position414, tokenIndex414 := position, tokenIndex
{
position415 := position
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l414
}
position++
l416:
{
position417, tokenIndex417 := position, tokenIndex
{
position418, tokenIndex418 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l419
}
position++
goto l418
l419:
position, tokenIndex = position418, tokenIndex418
if buffer[position] != rune('$') {
goto l417
}
position++
}
l418:
goto l416
l417:
position, tokenIndex = position417, tokenIndex417
}
add(ruleLocalLabel, position415)
}
return true
l414:
position, tokenIndex = position414, tokenIndex414
return false
},
/* 29 LocalLabelRef <- <([0-9] ([0-9] / '$')* ('b' / 'f'))> */
func() bool {
position420, tokenIndex420 := position, tokenIndex
{
position421 := position
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l420
}
position++
l422:
{
position423, tokenIndex423 := position, tokenIndex
{
position424, tokenIndex424 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l425
}
position++
goto l424
l425:
position, tokenIndex = position424, tokenIndex424
if buffer[position] != rune('$') {
goto l423
}
position++
}
l424:
goto l422
l423:
position, tokenIndex = position423, tokenIndex423
}
{
position426, tokenIndex426 := position, tokenIndex
if buffer[position] != rune('b') {
goto l427
}
position++
goto l426
l427:
position, tokenIndex = position426, tokenIndex426
if buffer[position] != rune('f') {
goto l420
}
position++
}
l426:
add(ruleLocalLabelRef, position421)
}
return true
l420:
position, tokenIndex = position420, tokenIndex420
return false
},
/* 30 Instruction <- <(InstructionName (WS InstructionArg (WS? ',' WS? InstructionArg)*)?)> */
func() bool {
position428, tokenIndex428 := position, tokenIndex
{
position429 := position
if !_rules[ruleInstructionName]() {
goto l428
}
{
position430, tokenIndex430 := position, tokenIndex
if !_rules[ruleWS]() {
goto l430
}
if !_rules[ruleInstructionArg]() {
goto l430
}
l432:
{
position433, tokenIndex433 := position, tokenIndex
{
position434, tokenIndex434 := position, tokenIndex
if !_rules[ruleWS]() {
goto l434
}
goto l435
l434:
position, tokenIndex = position434, tokenIndex434
}
l435:
if buffer[position] != rune(',') {
goto l433
}
position++
{
position436, tokenIndex436 := position, tokenIndex
if !_rules[ruleWS]() {
goto l436
}
goto l437
l436:
position, tokenIndex = position436, tokenIndex436
}
l437:
if !_rules[ruleInstructionArg]() {
goto l433
}
goto l432
l433:
position, tokenIndex = position433, tokenIndex433
}
goto l431
l430:
position, tokenIndex = position430, tokenIndex430
}
l431:
add(ruleInstruction, position429)
}
return true
l428:
position, tokenIndex = position428, tokenIndex428
return false
},
/* 31 InstructionName <- <(([a-z] / [A-Z]) ([a-z] / [A-Z] / '.' / ([0-9] / [0-9]))* ('.' / '+' / '-')?)> */
func() bool {
position438, tokenIndex438 := position, tokenIndex
{
position439 := position
{
position440, tokenIndex440 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l441
}
position++
goto l440
l441:
position, tokenIndex = position440, tokenIndex440
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l438
}
position++
}
l440:
l442:
{
position443, tokenIndex443 := position, tokenIndex
{
position444, tokenIndex444 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l445
}
position++
goto l444
l445:
position, tokenIndex = position444, tokenIndex444
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l446
}
position++
goto l444
l446:
position, tokenIndex = position444, tokenIndex444
if buffer[position] != rune('.') {
goto l447
}
position++
goto l444
l447:
position, tokenIndex = position444, tokenIndex444
{
position448, tokenIndex448 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l449
}
position++
goto l448
l449:
position, tokenIndex = position448, tokenIndex448
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l443
}
position++
}
l448:
}
l444:
goto l442
l443:
position, tokenIndex = position443, tokenIndex443
}
{
position450, tokenIndex450 := position, tokenIndex
{
position452, tokenIndex452 := position, tokenIndex
if buffer[position] != rune('.') {
goto l453
}
position++
goto l452
l453:
position, tokenIndex = position452, tokenIndex452
if buffer[position] != rune('+') {
goto l454
}
position++
goto l452
l454:
position, tokenIndex = position452, tokenIndex452
if buffer[position] != rune('-') {
goto l450
}
position++
}
l452:
goto l451
l450:
position, tokenIndex = position450, tokenIndex450
}
l451:
add(ruleInstructionName, position439)
}
return true
l438:
position, tokenIndex = position438, tokenIndex438
return false
},
/* 32 InstructionArg <- <(IndirectionIndicator? (ARMConstantTweak / RegisterOrConstant / LocalLabelRef / TOCRefHigh / TOCRefLow / GOTLocation / GOTSymbolOffset / MemoryRef) AVX512Token*)> */
func() bool {
position455, tokenIndex455 := position, tokenIndex
{
position456 := position
{
position457, tokenIndex457 := position, tokenIndex
if !_rules[ruleIndirectionIndicator]() {
goto l457
}
goto l458
l457:
position, tokenIndex = position457, tokenIndex457
}
l458:
{
position459, tokenIndex459 := position, tokenIndex
if !_rules[ruleARMConstantTweak]() {
goto l460
}
goto l459
l460:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleRegisterOrConstant]() {
goto l461
}
goto l459
l461:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleLocalLabelRef]() {
goto l462
}
goto l459
l462:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleTOCRefHigh]() {
goto l463
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
goto l459
l463:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleTOCRefLow]() {
goto l464
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
goto l459
l464:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleGOTLocation]() {
goto l465
}
goto l459
l465:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleGOTSymbolOffset]() {
goto l466
}
goto l459
l466:
position, tokenIndex = position459, tokenIndex459
if !_rules[ruleMemoryRef]() {
goto l455
}
}
l459:
l467:
{
position468, tokenIndex468 := position, tokenIndex
if !_rules[ruleAVX512Token]() {
goto l468
}
goto l467
l468:
position, tokenIndex = position468, tokenIndex468
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
add(ruleInstructionArg, position456)
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
return true
l455:
position, tokenIndex = position455, tokenIndex455
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
return false
},
/* 33 GOTLocation <- <('$' '_' 'G' 'L' 'O' 'B' 'A' 'L' '_' 'O' 'F' 'F' 'S' 'E' 'T' '_' 'T' 'A' 'B' 'L' 'E' '_' '-' LocalSymbol)> */
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
func() bool {
position469, tokenIndex469 := position, tokenIndex
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
{
position470 := position
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
if buffer[position] != rune('$') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('_') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('G') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('L') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('O') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('B') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('A') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('L') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('_') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('O') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('F') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('F') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('S') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('E') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('T') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('_') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('T') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('A') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('B') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('L') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('E') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('_') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('-') {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if !_rules[ruleLocalSymbol]() {
goto l469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
add(ruleGOTLocation, position470)
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
return true
l469:
position, tokenIndex = position469, tokenIndex469
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
return false
},
/* 34 GOTSymbolOffset <- <(('$' SymbolName ('@' 'G' 'O' 'T') ('O' 'F' 'F')?) / (':' ('g' / 'G') ('o' / 'O') ('t' / 'T') ':' SymbolName))> */
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
func() bool {
position471, tokenIndex471 := position, tokenIndex
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
{
position472 := position
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
{
position473, tokenIndex473 := position, tokenIndex
if buffer[position] != rune('$') {
goto l474
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if !_rules[ruleSymbolName]() {
goto l474
}
if buffer[position] != rune('@') {
goto l474
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('G') {
goto l474
delocation: large memory model support. Large memory models on x86-64 allow the code/data of a shared object / executable to be larger than 2GiB. This is typically impossible because x86-64 code frequently uses int32 offsets from RIP. Consider the following program: int getpid(); int main() { return getpid(); } This is turned into the following assembly under a large memory model: .L0$pb: leaq .L0$pb(%rip), %rax movabsq $_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx addq %rax, %rcx movabsq $getpid@GOT, %rdx xorl %eax, %eax jmpq *(%rcx,%rdx) # TAILCALL And, with relocations: 0: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 0 <main> 7: 48 b9 00 00 00 00 00 movabs $0x0,%rcx e: 00 00 00 9: R_X86_64_GOTPC64 _GLOBAL_OFFSET_TABLE_+0x9 11: 48 01 c1 add %rax,%rcx 14: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 1b: 00 00 00 16: R_X86_64_GOT64 getpid 1e: 31 c0 xor %eax,%eax 20: ff 24 11 jmpq *(%rcx,%rdx,1) We can see that, in the large memory model, function calls involve loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which takes a 64-bit immediate) and then indexing into it. Both cause relocations. If we link the binary and disassemble we get: 0000000000001120 <main>: 1120: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 1120 <main> 1127: 48 b9 e0 2e 00 00 00 movabs $0x2ee0,%rcx 112e: 00 00 00 1131: 48 01 c1 add %rax,%rcx 1134: 48 ba d8 ff ff ff ff movabs $0xffffffffffffffd8,%rdx 113b: ff ff ff 113e: 31 c0 xor %eax,%eax 1140: ff 24 11 jmpq *(%rcx,%rdx,1) Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000. That's the address of the .got.plt section. But the offset “into” the table is -0x40, putting it at 0x3fd8, in .got: Idx Name Size VMA LMA File off Algn 18 .got 00000030 0000000000003fd0 0000000000003fd0 00002fd0 2**3 19 .got.plt 00000018 0000000000004000 0000000000004000 00003000 2**3 And, indeed, there's a dynamic relocation to setup that address: OFFSET TYPE VALUE 0000000000003fd8 R_X86_64_GLOB_DAT getpid@GLIBC_2.2.5 Accessing data or BSS works the same: the address of the variable is stored relative to _GLOBAL_OFFSET_TABLE_. This is a bit of a pain because we want to delocate the module into a single .text segment so that it moves through linking unaltered. If we took the obvious path and built our own offset table then it would need to contain absolute addresses, but they are only available at runtime and .text segments aren't supposed to be run-time patched. (That's why .rela.dyn is a separate segment.) If we use a different segment then we have the same problem as with the original offset table: the offset to the segment is unknown when compiling the module. Trying to pattern match this two-step lookup to do extensive rewriting seems fragile: I'm sure the compilers will move things around and interleave other work in time, if they don't already. So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we define a symbol in the same segment, but outside of the hashed region of the module, that contains the offset from that position to _GLOBAL_OFFSET_TABLE_: .boringssl_got_delta: .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into: movq .boringssl_got_delta(%rip), %destreg addq $.boringssl_got_delta-.Lfoo, %destreg This works because it's calculating _GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo) When that value is added to .Lfoo, as the original code will do, the correct address results. Also it doesn't need an extra register because we know that 32-bit offsets are sufficient for offsets within the module. As for the offsets within the offset table, we have to load them from locations outside of the hashed part of the module to get the relocations out of the way. Again, no extra registers are needed. Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
5 years ago
}
position++
if buffer[position] != rune('O') {
goto l474
}
position++
if buffer[position] != rune('T') {
goto l474
}
position++
{
position475, tokenIndex475 := position, tokenIndex
if buffer[position] != rune('O') {
goto l475
}
position++
if buffer[position] != rune('F') {
goto l475
}
position++
if buffer[position] != rune('F') {
goto l475
}
position++
goto l476
l475:
position, tokenIndex = position475, tokenIndex475
}
l476:
goto l473
l474:
position, tokenIndex = position473, tokenIndex473
if buffer[position] != rune(':') {
goto l471
}
position++
{
position477, tokenIndex477 := position, tokenIndex
if buffer[position] != rune('g') {
goto l478
}
position++
goto l477
l478:
position, tokenIndex = position477, tokenIndex477
if buffer[position] != rune('G') {
goto l471
}
position++
}
l477:
{
position479, tokenIndex479 := position, tokenIndex
if buffer[position] != rune('o') {
goto l480
}
position++
goto l479
l480:
position, tokenIndex = position479, tokenIndex479
if buffer[position] != rune('O') {
goto l471
}
position++
}
l479:
{
position481, tokenIndex481 := position, tokenIndex
if buffer[position] != rune('t') {
goto l482
}
position++
goto l481
l482:
position, tokenIndex = position481, tokenIndex481
if buffer[position] != rune('T') {
goto l471
}
position++
}
l481:
if buffer[position] != rune(':') {
goto l471
}
position++
if !_rules[ruleSymbolName]() {
goto l471
}
}
l473:
add(ruleGOTSymbolOffset, position472)
}
return true
l471:
position, tokenIndex = position471, tokenIndex471
return false
},
/* 35 AVX512Token <- <(WS? '{' '%'? ([0-9] / [a-z])* '}')> */
func() bool {
position483, tokenIndex483 := position, tokenIndex
{
position484 := position
{
position485, tokenIndex485 := position, tokenIndex
if !_rules[ruleWS]() {
goto l485
}
goto l486
l485:
position, tokenIndex = position485, tokenIndex485
}
l486:
if buffer[position] != rune('{') {
goto l483
}
position++
{
position487, tokenIndex487 := position, tokenIndex
if buffer[position] != rune('%') {
goto l487
}
position++
goto l488
l487:
position, tokenIndex = position487, tokenIndex487
}
l488:
l489:
{
position490, tokenIndex490 := position, tokenIndex
{
position491, tokenIndex491 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l492
}
position++
goto l491
l492:
position, tokenIndex = position491, tokenIndex491
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l490
}
position++
}
l491:
goto l489
l490:
position, tokenIndex = position490, tokenIndex490
}
if buffer[position] != rune('}') {
goto l483
}
position++
add(ruleAVX512Token, position484)
}
return true
l483:
position, tokenIndex = position483, tokenIndex483
return false
},
/* 36 TOCRefHigh <- <('.' 'T' 'O' 'C' '.' '-' (('0' 'b') / ('.' 'L' ([a-z] / [A-Z] / '_' / [0-9])+)) ('@' ('h' / 'H') ('a' / 'A')))> */
func() bool {
position493, tokenIndex493 := position, tokenIndex
{
position494 := position
if buffer[position] != rune('.') {
goto l493
}
position++
if buffer[position] != rune('T') {
goto l493
}
position++
if buffer[position] != rune('O') {
goto l493
}
position++
if buffer[position] != rune('C') {
goto l493
}
position++
if buffer[position] != rune('.') {
goto l493
}
position++
if buffer[position] != rune('-') {
goto l493
}
position++
{
position495, tokenIndex495 := position, tokenIndex
if buffer[position] != rune('0') {
goto l496
}
position++
if buffer[position] != rune('b') {
goto l496
}
position++
goto l495
l496:
position, tokenIndex = position495, tokenIndex495
if buffer[position] != rune('.') {
goto l493
}
position++
if buffer[position] != rune('L') {
goto l493
}
position++
{
position499, tokenIndex499 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l500
}
position++
goto l499
l500:
position, tokenIndex = position499, tokenIndex499
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l501
}
position++
goto l499
l501:
position, tokenIndex = position499, tokenIndex499
if buffer[position] != rune('_') {
goto l502
}
position++
goto l499
l502:
position, tokenIndex = position499, tokenIndex499
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l493
}
position++
}
l499:
l497:
{
position498, tokenIndex498 := position, tokenIndex
{
position503, tokenIndex503 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l504
}
position++
goto l503
l504:
position, tokenIndex = position503, tokenIndex503
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l505
}
position++
goto l503
l505:
position, tokenIndex = position503, tokenIndex503
if buffer[position] != rune('_') {
goto l506
}
position++
goto l503
l506:
position, tokenIndex = position503, tokenIndex503
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l498
}
position++
}
l503:
goto l497
l498:
position, tokenIndex = position498, tokenIndex498
}
}
l495:
if buffer[position] != rune('@') {
goto l493
}
position++
{
position507, tokenIndex507 := position, tokenIndex
if buffer[position] != rune('h') {
goto l508
}
position++
goto l507
l508:
position, tokenIndex = position507, tokenIndex507
if buffer[position] != rune('H') {
goto l493
}
position++
}
l507:
{
position509, tokenIndex509 := position, tokenIndex
if buffer[position] != rune('a') {
goto l510
}
position++
goto l509
l510:
position, tokenIndex = position509, tokenIndex509
if buffer[position] != rune('A') {
goto l493
}
position++
}
l509:
add(ruleTOCRefHigh, position494)
}
return true
l493:
position, tokenIndex = position493, tokenIndex493
return false
},
/* 37 TOCRefLow <- <('.' 'T' 'O' 'C' '.' '-' (('0' 'b') / ('.' 'L' ([a-z] / [A-Z] / '_' / [0-9])+)) ('@' ('l' / 'L')))> */
func() bool {
position511, tokenIndex511 := position, tokenIndex
{
position512 := position
if buffer[position] != rune('.') {
goto l511
}
position++
if buffer[position] != rune('T') {
goto l511
}
position++
if buffer[position] != rune('O') {
goto l511
}
position++
if buffer[position] != rune('C') {
goto l511
}
position++
if buffer[position] != rune('.') {
goto l511
}
position++
if buffer[position] != rune('-') {
goto l511
}
position++
{
position513, tokenIndex513 := position, tokenIndex
if buffer[position] != rune('0') {
goto l514
}
position++
if buffer[position] != rune('b') {
goto l514
}
position++
goto l513
l514:
position, tokenIndex = position513, tokenIndex513
if buffer[position] != rune('.') {
goto l511
}
position++
if buffer[position] != rune('L') {
goto l511
}
position++
{
position517, tokenIndex517 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l518
}
position++
goto l517
l518:
position, tokenIndex = position517, tokenIndex517
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l519
}
position++
goto l517
l519:
position, tokenIndex = position517, tokenIndex517
if buffer[position] != rune('_') {
goto l520
}
position++
goto l517
l520:
position, tokenIndex = position517, tokenIndex517
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l511
}
position++
}
l517:
l515:
{
position516, tokenIndex516 := position, tokenIndex
{
position521, tokenIndex521 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l522
}
position++
goto l521
l522:
position, tokenIndex = position521, tokenIndex521
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l523
}
position++
goto l521
l523:
position, tokenIndex = position521, tokenIndex521
if buffer[position] != rune('_') {
goto l524
}
position++
goto l521
l524:
position, tokenIndex = position521, tokenIndex521
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l516
}
position++
}
l521:
goto l515
l516:
position, tokenIndex = position516, tokenIndex516
}
}
l513:
if buffer[position] != rune('@') {
goto l511
}
position++
{
position525, tokenIndex525 := position, tokenIndex
if buffer[position] != rune('l') {
goto l526
}
position++
goto l525
l526:
position, tokenIndex = position525, tokenIndex525
if buffer[position] != rune('L') {
goto l511
}
position++
}
l525:
add(ruleTOCRefLow, position512)
}
return true
l511:
position, tokenIndex = position511, tokenIndex511
return false
},
/* 38 IndirectionIndicator <- <'*'> */
func() bool {
position527, tokenIndex527 := position, tokenIndex
{
position528 := position
if buffer[position] != rune('*') {
goto l527
}
position++
add(ruleIndirectionIndicator, position528)
}
return true
l527:
position, tokenIndex = position527, tokenIndex527
return false
},
/* 39 RegisterOrConstant <- <((('%' ([a-z] / [A-Z]) ([a-z] / [A-Z] / ([0-9] / [0-9]))*) / ('$'? ((Offset Offset) / Offset)) / ('#' Offset ('*' [0-9]+ ('-' [0-9] [0-9]*)?)?) / ('#' '~'? '(' [0-9] WS? ('<' '<') WS? [0-9] ')') / ARMRegister) !('f' / 'b' / ':' / '(' / '+' / '-'))> */
func() bool {
position529, tokenIndex529 := position, tokenIndex
{
position530 := position
{
position531, tokenIndex531 := position, tokenIndex
if buffer[position] != rune('%') {
goto l532
}
position++
{
position533, tokenIndex533 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l534
}
position++
goto l533
l534:
position, tokenIndex = position533, tokenIndex533
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l532
}
position++
}
l533:
l535:
{
position536, tokenIndex536 := position, tokenIndex
{
position537, tokenIndex537 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
goto l538
}
position++
goto l537
l538:
position, tokenIndex = position537, tokenIndex537
if c := buffer[position]; c < rune('A') || c > rune('Z') {
goto l539
}
position++
goto l537
l539:
position, tokenIndex = position537, tokenIndex537
{
position540, tokenIndex540 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l541
}
position++
goto l540
l541:
position, tokenIndex = position540, tokenIndex540
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l536
}
position++
}
l540:
}
l537:
goto l535
l536:
position, tokenIndex = position536, tokenIndex536
}
goto l531
l532:
position, tokenIndex = position531, tokenIndex531
{
position543, tokenIndex543 := position, tokenIndex
if buffer[position] != rune('$') {
goto l543
}
position++
goto l544
l543:
position, tokenIndex = position543, tokenIndex543
}
l544:
{
position545, tokenIndex545 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l546
}
if !_rules[ruleOffset]() {
goto l546
}
goto l545
l546:
position, tokenIndex = position545, tokenIndex545
if !_rules[ruleOffset]() {
goto l542
}
}
l545:
goto l531
l542:
position, tokenIndex = position531, tokenIndex531
if buffer[position] != rune('#') {
goto l547
}
position++
if !_rules[ruleOffset]() {
goto l547
}
{
position548, tokenIndex548 := position, tokenIndex
if buffer[position] != rune('*') {
goto l548
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l548
}
position++
l550:
{
position551, tokenIndex551 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l551
}
position++
goto l550
l551:
position, tokenIndex = position551, tokenIndex551
}
{
position552, tokenIndex552 := position, tokenIndex
if buffer[position] != rune('-') {
goto l552
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l552
}
position++
l554:
{
position555, tokenIndex555 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l555
}
position++
goto l554
l555:
position, tokenIndex = position555, tokenIndex555
}
goto l553
l552:
position, tokenIndex = position552, tokenIndex552
}
l553:
goto l549
l548:
position, tokenIndex = position548, tokenIndex548
}
l549:
goto l531
l547:
position, tokenIndex = position531, tokenIndex531
if buffer[position] != rune('#') {
goto l556
}
position++
{
position557, tokenIndex557 := position, tokenIndex
if buffer[position] != rune('~') {
goto l557
}
position++
goto l558
l557:
position, tokenIndex = position557, tokenIndex557
}
l558:
if buffer[position] != rune('(') {
goto l556
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l556
}
position++
{
position559, tokenIndex559 := position, tokenIndex
if !_rules[ruleWS]() {
goto l559
}
goto l560
l559:
position, tokenIndex = position559, tokenIndex559
}
l560:
if buffer[position] != rune('<') {
goto l556
}
position++
if buffer[position] != rune('<') {
goto l556
}
position++
{
position561, tokenIndex561 := position, tokenIndex
if !_rules[ruleWS]() {
goto l561
}
goto l562
l561:
position, tokenIndex = position561, tokenIndex561
}
l562:
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l556
}
position++
if buffer[position] != rune(')') {
goto l556
}
position++
goto l531
l556:
position, tokenIndex = position531, tokenIndex531
if !_rules[ruleARMRegister]() {
goto l529
}
}
l531:
{
position563, tokenIndex563 := position, tokenIndex
{
position564, tokenIndex564 := position, tokenIndex
if buffer[position] != rune('f') {
goto l565
}
position++
goto l564
l565:
position, tokenIndex = position564, tokenIndex564
if buffer[position] != rune('b') {
goto l566
}
position++
goto l564
l566:
position, tokenIndex = position564, tokenIndex564
if buffer[position] != rune(':') {
goto l567
}
position++
goto l564
l567:
position, tokenIndex = position564, tokenIndex564
if buffer[position] != rune('(') {
goto l568
}
position++
goto l564
l568:
position, tokenIndex = position564, tokenIndex564
if buffer[position] != rune('+') {
goto l569
}
position++
goto l564
l569:
position, tokenIndex = position564, tokenIndex564
if buffer[position] != rune('-') {
goto l563
}
position++
}
l564:
goto l529
l563:
position, tokenIndex = position563, tokenIndex563
}
add(ruleRegisterOrConstant, position530)
}
return true
l529:
position, tokenIndex = position529, tokenIndex529
return false
},
/* 40 ARMConstantTweak <- <(((('l' / 'L') ('s' / 'S') ('l' / 'L')) / (('s' / 'S') ('x' / 'X') ('t' / 'T') ('w' / 'W')) / (('s' / 'S') ('x' / 'X') ('t' / 'T') ('b' / 'B')) / (('u' / 'U') ('x' / 'X') ('t' / 'T') ('w' / 'W')) / (('u' / 'U') ('x' / 'X') ('t' / 'T') ('b' / 'B')) / (('l' / 'L') ('s' / 'S') ('r' / 'R')) / (('r' / 'R') ('o' / 'O') ('r' / 'R')) / (('a' / 'A') ('s' / 'S') ('r' / 'R'))) (WS '#' Offset)?)> */
func() bool {
position570, tokenIndex570 := position, tokenIndex
{
position571 := position
{
position572, tokenIndex572 := position, tokenIndex
{
position574, tokenIndex574 := position, tokenIndex
if buffer[position] != rune('l') {
goto l575
}
position++
goto l574
l575:
position, tokenIndex = position574, tokenIndex574
if buffer[position] != rune('L') {
goto l573
}
position++
}
l574:
{
position576, tokenIndex576 := position, tokenIndex
if buffer[position] != rune('s') {
goto l577
}
position++
goto l576
l577:
position, tokenIndex = position576, tokenIndex576
if buffer[position] != rune('S') {
goto l573
}
position++
}
l576:
{
position578, tokenIndex578 := position, tokenIndex
if buffer[position] != rune('l') {
goto l579
}
position++
goto l578
l579:
position, tokenIndex = position578, tokenIndex578
if buffer[position] != rune('L') {
goto l573
}
position++
}
l578:
goto l572
l573:
position, tokenIndex = position572, tokenIndex572
{
position581, tokenIndex581 := position, tokenIndex
if buffer[position] != rune('s') {
goto l582
}
position++
goto l581
l582:
position, tokenIndex = position581, tokenIndex581
if buffer[position] != rune('S') {
goto l580
}
position++
}
l581:
{
position583, tokenIndex583 := position, tokenIndex
if buffer[position] != rune('x') {
goto l584
}
position++
goto l583
l584:
position, tokenIndex = position583, tokenIndex583
if buffer[position] != rune('X') {
goto l580
}
position++
}
l583:
{
position585, tokenIndex585 := position, tokenIndex
if buffer[position] != rune('t') {
goto l586
}
position++
goto l585
l586:
position, tokenIndex = position585, tokenIndex585
if buffer[position] != rune('T') {
goto l580
}
position++
}
l585:
{
position587, tokenIndex587 := position, tokenIndex
if buffer[position] != rune('w') {
goto l588
}
position++
goto l587
l588:
position, tokenIndex = position587, tokenIndex587
if buffer[position] != rune('W') {
goto l580
}
position++
}
l587:
goto l572
l580:
position, tokenIndex = position572, tokenIndex572
{
position590, tokenIndex590 := position, tokenIndex
if buffer[position] != rune('s') {
goto l591
}
position++
goto l590
l591:
position, tokenIndex = position590, tokenIndex590
if buffer[position] != rune('S') {
goto l589
}
position++
}
l590:
{
position592, tokenIndex592 := position, tokenIndex
if buffer[position] != rune('x') {
goto l593
}
position++
goto l592
l593:
position, tokenIndex = position592, tokenIndex592
if buffer[position] != rune('X') {
goto l589
}
position++
}
l592:
{
position594, tokenIndex594 := position, tokenIndex
if buffer[position] != rune('t') {
goto l595
}
position++
goto l594
l595:
position, tokenIndex = position594, tokenIndex594
if buffer[position] != rune('T') {
goto l589
}
position++
}
l594:
{
position596, tokenIndex596 := position, tokenIndex
if buffer[position] != rune('b') {
goto l597
}
position++
goto l596
l597:
position, tokenIndex = position596, tokenIndex596
if buffer[position] != rune('B') {
goto l589
}
position++
}
l596:
goto l572
l589:
position, tokenIndex = position572, tokenIndex572
{
position599, tokenIndex599 := position, tokenIndex
if buffer[position] != rune('u') {
goto l600
}
position++
goto l599
l600:
position, tokenIndex = position599, tokenIndex599
if buffer[position] != rune('U') {
goto l598
}
position++
}
l599:
{
position601, tokenIndex601 := position, tokenIndex
if buffer[position] != rune('x') {
goto l602
}
position++
goto l601
l602:
position, tokenIndex = position601, tokenIndex601
if buffer[position] != rune('X') {
goto l598
}
position++
}
l601:
{
position603, tokenIndex603 := position, tokenIndex
if buffer[position] != rune('t') {
goto l604
}
position++
goto l603
l604:
position, tokenIndex = position603, tokenIndex603
if buffer[position] != rune('T') {
goto l598
}
position++
}
l603:
{
position605, tokenIndex605 := position, tokenIndex
if buffer[position] != rune('w') {
goto l606
}
position++
goto l605
l606:
position, tokenIndex = position605, tokenIndex605
if buffer[position] != rune('W') {
goto l598
}
position++
}
l605:
goto l572
l598:
position, tokenIndex = position572, tokenIndex572
{
position608, tokenIndex608 := position, tokenIndex
if buffer[position] != rune('u') {
goto l609
}
position++
goto l608
l609:
position, tokenIndex = position608, tokenIndex608
if buffer[position] != rune('U') {
goto l607
}
position++
}
l608:
{
position610, tokenIndex610 := position, tokenIndex
if buffer[position] != rune('x') {
goto l611
}
position++
goto l610
l611:
position, tokenIndex = position610, tokenIndex610
if buffer[position] != rune('X') {
goto l607
}
position++
}
l610:
{
position612, tokenIndex612 := position, tokenIndex
if buffer[position] != rune('t') {
goto l613
}
position++
goto l612
l613:
position, tokenIndex = position612, tokenIndex612
if buffer[position] != rune('T') {
goto l607
}
position++
}
l612:
{
position614, tokenIndex614 := position, tokenIndex
if buffer[position] != rune('b') {
goto l615
}
position++
goto l614
l615:
position, tokenIndex = position614, tokenIndex614
if buffer[position] != rune('B') {
goto l607
}
position++
}
l614:
goto l572
l607:
position, tokenIndex = position572, tokenIndex572
{
position617, tokenIndex617 := position, tokenIndex
if buffer[position] != rune('l') {
goto l618
}
position++
goto l617
l618:
position, tokenIndex = position617, tokenIndex617
if buffer[position] != rune('L') {
goto l616
}
position++
}
l617:
{
position619, tokenIndex619 := position, tokenIndex
if buffer[position] != rune('s') {
goto l620
}
position++
goto l619
l620:
position, tokenIndex = position619, tokenIndex619
if buffer[position] != rune('S') {
goto l616
}
position++
}
l619:
{
position621, tokenIndex621 := position, tokenIndex
if buffer[position] != rune('r') {
goto l622
}
position++
goto l621
l622:
position, tokenIndex = position621, tokenIndex621
if buffer[position] != rune('R') {
goto l616
}
position++
}
l621:
goto l572
l616:
position, tokenIndex = position572, tokenIndex572
{
position624, tokenIndex624 := position, tokenIndex
if buffer[position] != rune('r') {
goto l625
}
position++
goto l624
l625:
position, tokenIndex = position624, tokenIndex624
if buffer[position] != rune('R') {
goto l623
}
position++
}
l624:
{
position626, tokenIndex626 := position, tokenIndex
if buffer[position] != rune('o') {
goto l627
}
position++
goto l626
l627:
position, tokenIndex = position626, tokenIndex626
if buffer[position] != rune('O') {
goto l623
}
position++
}
l626:
{
position628, tokenIndex628 := position, tokenIndex
if buffer[position] != rune('r') {
goto l629
}
position++
goto l628
l629:
position, tokenIndex = position628, tokenIndex628
if buffer[position] != rune('R') {
goto l623
}
position++
}
l628:
goto l572
l623:
position, tokenIndex = position572, tokenIndex572
{
position630, tokenIndex630 := position, tokenIndex
if buffer[position] != rune('a') {
goto l631
}
position++
goto l630
l631:
position, tokenIndex = position630, tokenIndex630
if buffer[position] != rune('A') {
goto l570
}
position++
}
l630:
{
position632, tokenIndex632 := position, tokenIndex
if buffer[position] != rune('s') {
goto l633
}
position++
goto l632
l633:
position, tokenIndex = position632, tokenIndex632
if buffer[position] != rune('S') {
goto l570
}
position++
}
l632:
{
position634, tokenIndex634 := position, tokenIndex
if buffer[position] != rune('r') {
goto l635
}
position++
goto l634
l635:
position, tokenIndex = position634, tokenIndex634
if buffer[position] != rune('R') {
goto l570
}
position++
}
l634:
}
l572:
{
position636, tokenIndex636 := position, tokenIndex
if !_rules[ruleWS]() {
goto l636
}
if buffer[position] != rune('#') {
goto l636
}
position++
if !_rules[ruleOffset]() {
goto l636
}
goto l637
l636:
position, tokenIndex = position636, tokenIndex636
}
l637:
add(ruleARMConstantTweak, position571)
}
return true
l570:
position, tokenIndex = position570, tokenIndex570
return false
},
/* 41 ARMRegister <- <((('s' / 'S') ('p' / 'P')) / (('x' / 'w' / 'd' / 'q' / 's') [0-9] [0-9]?) / (('x' / 'X') ('z' / 'Z') ('r' / 'R')) / (('w' / 'W') ('z' / 'Z') ('r' / 'R')) / ARMVectorRegister / ('{' WS? ARMVectorRegister (',' WS? ARMVectorRegister)* WS? '}' ('[' [0-9] [0-9]? ']')?))> */
func() bool {
position638, tokenIndex638 := position, tokenIndex
{
position639 := position
{
position640, tokenIndex640 := position, tokenIndex
{
position642, tokenIndex642 := position, tokenIndex
if buffer[position] != rune('s') {
goto l643
}
position++
goto l642
l643:
position, tokenIndex = position642, tokenIndex642
if buffer[position] != rune('S') {
goto l641
}
position++
}
l642:
{
position644, tokenIndex644 := position, tokenIndex
if buffer[position] != rune('p') {
goto l645
}
position++
goto l644
l645:
position, tokenIndex = position644, tokenIndex644
if buffer[position] != rune('P') {
goto l641
}
position++
}
l644:
goto l640
l641:
position, tokenIndex = position640, tokenIndex640
{
position647, tokenIndex647 := position, tokenIndex
if buffer[position] != rune('x') {
goto l648
}
position++
goto l647
l648:
position, tokenIndex = position647, tokenIndex647
if buffer[position] != rune('w') {
goto l649
}
position++
goto l647
l649:
position, tokenIndex = position647, tokenIndex647
if buffer[position] != rune('d') {
goto l650
}
position++
goto l647
l650:
position, tokenIndex = position647, tokenIndex647
if buffer[position] != rune('q') {
goto l651
}
position++
goto l647
l651:
position, tokenIndex = position647, tokenIndex647
if buffer[position] != rune('s') {
goto l646
}
position++
}
l647:
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l646
}
position++
{
position652, tokenIndex652 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l652
}
position++
goto l653
l652:
position, tokenIndex = position652, tokenIndex652
}
l653:
goto l640
l646:
position, tokenIndex = position640, tokenIndex640
{
position655, tokenIndex655 := position, tokenIndex
if buffer[position] != rune('x') {
goto l656
}
position++
goto l655
l656:
position, tokenIndex = position655, tokenIndex655
if buffer[position] != rune('X') {
goto l654
}
position++
}
l655:
{
position657, tokenIndex657 := position, tokenIndex
if buffer[position] != rune('z') {
goto l658
}
position++
goto l657
l658:
position, tokenIndex = position657, tokenIndex657
if buffer[position] != rune('Z') {
goto l654
}
position++
}
l657:
{
position659, tokenIndex659 := position, tokenIndex
if buffer[position] != rune('r') {
goto l660
}
position++
goto l659
l660:
position, tokenIndex = position659, tokenIndex659
if buffer[position] != rune('R') {
goto l654
}
position++
}
l659:
goto l640
l654:
position, tokenIndex = position640, tokenIndex640
{
position662, tokenIndex662 := position, tokenIndex
if buffer[position] != rune('w') {
goto l663
}
position++
goto l662
l663:
position, tokenIndex = position662, tokenIndex662
if buffer[position] != rune('W') {
goto l661
}
position++
}
l662:
{
position664, tokenIndex664 := position, tokenIndex
if buffer[position] != rune('z') {
goto l665
}
position++
goto l664
l665:
position, tokenIndex = position664, tokenIndex664
if buffer[position] != rune('Z') {
goto l661
}
position++
}
l664:
{
position666, tokenIndex666 := position, tokenIndex
if buffer[position] != rune('r') {
goto l667
}
position++
goto l666
l667:
position, tokenIndex = position666, tokenIndex666
if buffer[position] != rune('R') {
goto l661
}
position++
}
l666:
goto l640
l661:
position, tokenIndex = position640, tokenIndex640
if !_rules[ruleARMVectorRegister]() {
goto l668
}
goto l640
l668:
position, tokenIndex = position640, tokenIndex640
if buffer[position] != rune('{') {
goto l638
}
position++
{
position669, tokenIndex669 := position, tokenIndex
if !_rules[ruleWS]() {
goto l669
}
goto l670
l669:
position, tokenIndex = position669, tokenIndex669
}
l670:
if !_rules[ruleARMVectorRegister]() {
goto l638
}
l671:
{
position672, tokenIndex672 := position, tokenIndex
if buffer[position] != rune(',') {
goto l672
}
position++
{
position673, tokenIndex673 := position, tokenIndex
if !_rules[ruleWS]() {
goto l673
}
goto l674
l673:
position, tokenIndex = position673, tokenIndex673
}
l674:
if !_rules[ruleARMVectorRegister]() {
goto l672
}
goto l671
l672:
position, tokenIndex = position672, tokenIndex672
}
{
position675, tokenIndex675 := position, tokenIndex
if !_rules[ruleWS]() {
goto l675
}
goto l676
l675:
position, tokenIndex = position675, tokenIndex675
}
l676:
if buffer[position] != rune('}') {
goto l638
}
position++
{
position677, tokenIndex677 := position, tokenIndex
if buffer[position] != rune('[') {
goto l677
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l677
}
position++
{
position679, tokenIndex679 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l679
}
position++
goto l680
l679:
position, tokenIndex = position679, tokenIndex679
}
l680:
if buffer[position] != rune(']') {
goto l677
}
position++
goto l678
l677:
position, tokenIndex = position677, tokenIndex677
}
l678:
}
l640:
add(ruleARMRegister, position639)
}
return true
l638:
position, tokenIndex = position638, tokenIndex638
return false
},
/* 42 ARMVectorRegister <- <(('v' / 'V') [0-9] [0-9]? ('.' [0-9]* ('b' / 's' / 'd' / 'h' / 'q') ('[' [0-9] [0-9]? ']')?)?)> */
func() bool {
position681, tokenIndex681 := position, tokenIndex
{
position682 := position
{
position683, tokenIndex683 := position, tokenIndex
if buffer[position] != rune('v') {
goto l684
}
position++
goto l683
l684:
position, tokenIndex = position683, tokenIndex683
if buffer[position] != rune('V') {
goto l681
}
position++
}
l683:
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l681
}
position++
{
position685, tokenIndex685 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l685
}
position++
goto l686
l685:
position, tokenIndex = position685, tokenIndex685
}
l686:
{
position687, tokenIndex687 := position, tokenIndex
if buffer[position] != rune('.') {
goto l687
}
position++
l689:
{
position690, tokenIndex690 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l690
}
position++
goto l689
l690:
position, tokenIndex = position690, tokenIndex690
}
{
position691, tokenIndex691 := position, tokenIndex
if buffer[position] != rune('b') {
goto l692
}
position++
goto l691
l692:
position, tokenIndex = position691, tokenIndex691
if buffer[position] != rune('s') {
goto l693
}
position++
goto l691
l693:
position, tokenIndex = position691, tokenIndex691
if buffer[position] != rune('d') {
goto l694
}
position++
goto l691
l694:
position, tokenIndex = position691, tokenIndex691
if buffer[position] != rune('h') {
goto l695
}
position++
goto l691
l695:
position, tokenIndex = position691, tokenIndex691
if buffer[position] != rune('q') {
goto l687
}
position++
}
l691:
{
position696, tokenIndex696 := position, tokenIndex
if buffer[position] != rune('[') {
goto l696
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l696
}
position++
{
position698, tokenIndex698 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l698
}
position++
goto l699
l698:
position, tokenIndex = position698, tokenIndex698
}
l699:
if buffer[position] != rune(']') {
goto l696
}
position++
goto l697
l696:
position, tokenIndex = position696, tokenIndex696
}
l697:
goto l688
l687:
position, tokenIndex = position687, tokenIndex687
}
l688:
add(ruleARMVectorRegister, position682)
}
return true
l681:
position, tokenIndex = position681, tokenIndex681
return false
},
/* 43 MemoryRef <- <((SymbolRef BaseIndexScale) / SymbolRef / Low12BitsSymbolRef / (Offset* BaseIndexScale) / (SegmentRegister Offset BaseIndexScale) / (SegmentRegister BaseIndexScale) / (SegmentRegister Offset) / ARMBaseIndexScale / BaseIndexScale)> */
func() bool {
position700, tokenIndex700 := position, tokenIndex
{
position701 := position
{
position702, tokenIndex702 := position, tokenIndex
if !_rules[ruleSymbolRef]() {
goto l703
}
if !_rules[ruleBaseIndexScale]() {
goto l703
}
goto l702
l703:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleSymbolRef]() {
goto l704
}
goto l702
l704:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleLow12BitsSymbolRef]() {
goto l705
}
goto l702
l705:
position, tokenIndex = position702, tokenIndex702
l707:
{
position708, tokenIndex708 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l708
}
goto l707
l708:
position, tokenIndex = position708, tokenIndex708
}
if !_rules[ruleBaseIndexScale]() {
goto l706
}
goto l702
l706:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleSegmentRegister]() {
goto l709
}
if !_rules[ruleOffset]() {
goto l709
}
if !_rules[ruleBaseIndexScale]() {
goto l709
}
goto l702
l709:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleSegmentRegister]() {
goto l710
}
if !_rules[ruleBaseIndexScale]() {
goto l710
}
goto l702
l710:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleSegmentRegister]() {
goto l711
}
if !_rules[ruleOffset]() {
goto l711
}
goto l702
l711:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleARMBaseIndexScale]() {
goto l712
}
goto l702
l712:
position, tokenIndex = position702, tokenIndex702
if !_rules[ruleBaseIndexScale]() {
goto l700
}
}
l702:
add(ruleMemoryRef, position701)
}
return true
l700:
position, tokenIndex = position700, tokenIndex700
return false
},
/* 44 SymbolRef <- <((Offset* '+')? (LocalSymbol / SymbolName) Offset* ('@' Section Offset*)?)> */
func() bool {
position713, tokenIndex713 := position, tokenIndex
{
position714 := position
{
position715, tokenIndex715 := position, tokenIndex
l717:
{
position718, tokenIndex718 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l718
}
goto l717
l718:
position, tokenIndex = position718, tokenIndex718
}
if buffer[position] != rune('+') {
goto l715
}
position++
goto l716
l715:
position, tokenIndex = position715, tokenIndex715
}
l716:
{
position719, tokenIndex719 := position, tokenIndex
if !_rules[ruleLocalSymbol]() {
goto l720
}
goto l719
l720:
position, tokenIndex = position719, tokenIndex719
if !_rules[ruleSymbolName]() {
goto l713
}
}
l719:
l721:
{
position722, tokenIndex722 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l722
}
goto l721
l722:
position, tokenIndex = position722, tokenIndex722
}
{
position723, tokenIndex723 := position, tokenIndex
if buffer[position] != rune('@') {
goto l723
}
position++
if !_rules[ruleSection]() {
goto l723
}
l725:
{
position726, tokenIndex726 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l726
}
goto l725
l726:
position, tokenIndex = position726, tokenIndex726
}
goto l724
l723:
position, tokenIndex = position723, tokenIndex723
}
l724:
add(ruleSymbolRef, position714)
}
return true
l713:
position, tokenIndex = position713, tokenIndex713
return false
},
/* 45 Low12BitsSymbolRef <- <(':' ('l' / 'L') ('o' / 'O') '1' '2' ':' (LocalSymbol / SymbolName) Offset?)> */
func() bool {
position727, tokenIndex727 := position, tokenIndex
{
position728 := position
if buffer[position] != rune(':') {
goto l727
}
position++
{
position729, tokenIndex729 := position, tokenIndex
if buffer[position] != rune('l') {
goto l730
}
position++
goto l729
l730:
position, tokenIndex = position729, tokenIndex729
if buffer[position] != rune('L') {
goto l727
}
position++
}
l729:
{
position731, tokenIndex731 := position, tokenIndex
if buffer[position] != rune('o') {
goto l732
}
position++
goto l731
l732:
position, tokenIndex = position731, tokenIndex731
if buffer[position] != rune('O') {
goto l727
}
position++
}
l731:
if buffer[position] != rune('1') {
goto l727
}
position++
if buffer[position] != rune('2') {
goto l727
}
position++
if buffer[position] != rune(':') {
goto l727
}
position++
{
position733, tokenIndex733 := position, tokenIndex
if !_rules[ruleLocalSymbol]() {
goto l734
}
goto l733
l734:
position, tokenIndex = position733, tokenIndex733
if !_rules[ruleSymbolName]() {
goto l727
}
}
l733:
{
position735, tokenIndex735 := position, tokenIndex
if !_rules[ruleOffset]() {
goto l735
}
goto l736
l735:
position, tokenIndex = position735, tokenIndex735
}
l736:
add(ruleLow12BitsSymbolRef, position728)
}
return true
l727:
position, tokenIndex = position727, tokenIndex727
return false
},
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
/* 46 ARMBaseIndexScale <- <('[' ARMRegister (',' WS? (('#' Offset (('*' [0-9]+) / ('*' '(' [0-9]+ Operator [0-9]+ ')') / ('+' [0-9]+)*)?) / ARMGOTLow12 / Low12BitsSymbolRef / ARMRegister) (',' WS? ARMConstantTweak)?)? ']' ARMPostincrement?)> */
func() bool {
position737, tokenIndex737 := position, tokenIndex
{
position738 := position
if buffer[position] != rune('[') {
goto l737
}
position++
if !_rules[ruleARMRegister]() {
goto l737
}
{
position739, tokenIndex739 := position, tokenIndex
if buffer[position] != rune(',') {
goto l739
}
position++
{
position741, tokenIndex741 := position, tokenIndex
if !_rules[ruleWS]() {
goto l741
}
goto l742
l741:
position, tokenIndex = position741, tokenIndex741
}
l742:
{
position743, tokenIndex743 := position, tokenIndex
if buffer[position] != rune('#') {
goto l744
}
position++
if !_rules[ruleOffset]() {
goto l744
}
{
position745, tokenIndex745 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position747, tokenIndex747 := position, tokenIndex
if buffer[position] != rune('*') {
goto l748
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l748
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l749:
{
position750, tokenIndex750 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l750
}
position++
goto l749
l750:
position, tokenIndex = position750, tokenIndex750
}
goto l747
l748:
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position, tokenIndex = position747, tokenIndex747
if buffer[position] != rune('*') {
goto l751
}
position++
if buffer[position] != rune('(') {
goto l751
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l751
}
position++
l752:
{
position753, tokenIndex753 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l753
}
position++
goto l752
l753:
position, tokenIndex = position753, tokenIndex753
}
if !_rules[ruleOperator]() {
goto l751
}
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l751
}
position++
l754:
{
position755, tokenIndex755 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l755
}
position++
goto l754
l755:
position, tokenIndex = position755, tokenIndex755
}
if buffer[position] != rune(')') {
goto l751
}
position++
goto l747
l751:
position, tokenIndex = position747, tokenIndex747
l756:
{
position757, tokenIndex757 := position, tokenIndex
if buffer[position] != rune('+') {
goto l757
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l757
}
position++
l758:
{
position759, tokenIndex759 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
goto l759
}
position++
goto l758
l759:
position, tokenIndex = position759, tokenIndex759
}
goto l756
l757:
position, tokenIndex = position757, tokenIndex757
}
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l747:
goto l746
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position, tokenIndex = position745, tokenIndex745
}
l746:
goto l743
l744:
position, tokenIndex = position743, tokenIndex743
if !_rules[ruleARMGOTLow12]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l760
}
goto l743
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l760:
position, tokenIndex = position743, tokenIndex743
if !_rules[ruleLow12BitsSymbolRef]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l761
}
goto l743
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l761:
position, tokenIndex = position743, tokenIndex743
if !_rules[ruleARMRegister]() {
goto l739
}
}
l743:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position762, tokenIndex762 := position, tokenIndex
if buffer[position] != rune(',') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l762
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position764, tokenIndex764 := position, tokenIndex
if !_rules[ruleWS]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l764
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l765
l764:
position, tokenIndex = position764, tokenIndex764
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l765:
if !_rules[ruleARMConstantTweak]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l762
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l763
l762:
position, tokenIndex = position762, tokenIndex762
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l763:
goto l740
l739:
position, tokenIndex = position739, tokenIndex739
}
l740:
if buffer[position] != rune(']') {
goto l737
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position766, tokenIndex766 := position, tokenIndex
if !_rules[ruleARMPostincrement]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l766
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l767
l766:
position, tokenIndex = position766, tokenIndex766
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l767:
add(ruleARMBaseIndexScale, position738)
}
return true
l737:
position, tokenIndex = position737, tokenIndex737
return false
},
/* 47 ARMGOTLow12 <- <(':' ('g' / 'G') ('o' / 'O') ('t' / 'T') '_' ('l' / 'L') ('o' / 'O') '1' '2' ':' SymbolName)> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position768, tokenIndex768 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position769 := position
if buffer[position] != rune(':') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position770, tokenIndex770 := position, tokenIndex
if buffer[position] != rune('g') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l771
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l770
l771:
position, tokenIndex = position770, tokenIndex770
if buffer[position] != rune('G') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l770:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position772, tokenIndex772 := position, tokenIndex
if buffer[position] != rune('o') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l773
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l772
l773:
position, tokenIndex = position772, tokenIndex772
if buffer[position] != rune('O') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l772:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position774, tokenIndex774 := position, tokenIndex
if buffer[position] != rune('t') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l775
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l774
l775:
position, tokenIndex = position774, tokenIndex774
if buffer[position] != rune('T') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l774:
if buffer[position] != rune('_') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position776, tokenIndex776 := position, tokenIndex
if buffer[position] != rune('l') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l777
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l776
l777:
position, tokenIndex = position776, tokenIndex776
if buffer[position] != rune('L') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l776:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position778, tokenIndex778 := position, tokenIndex
if buffer[position] != rune('o') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l779
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l778
l779:
position, tokenIndex = position778, tokenIndex778
if buffer[position] != rune('O') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l778:
if buffer[position] != rune('1') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
if buffer[position] != rune('2') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
if buffer[position] != rune(':') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
position++
if !_rules[ruleSymbolName]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l768
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
add(ruleARMGOTLow12, position769)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l768:
position, tokenIndex = position768, tokenIndex768
return false
},
/* 48 ARMPostincrement <- <'!'> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position780, tokenIndex780 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position781 := position
if buffer[position] != rune('!') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l780
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
add(ruleARMPostincrement, position781)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l780:
position, tokenIndex = position780, tokenIndex780
return false
},
/* 49 BaseIndexScale <- <('(' RegisterOrConstant? WS? (',' WS? RegisterOrConstant WS? (',' [0-9]+)?)? ')')> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position782, tokenIndex782 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position783 := position
if buffer[position] != rune('(') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l782
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position784, tokenIndex784 := position, tokenIndex
if !_rules[ruleRegisterOrConstant]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l784
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l785
l784:
position, tokenIndex = position784, tokenIndex784
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l785:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position786, tokenIndex786 := position, tokenIndex
if !_rules[ruleWS]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l786
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l787
l786:
position, tokenIndex = position786, tokenIndex786
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l787:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position788, tokenIndex788 := position, tokenIndex
if buffer[position] != rune(',') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l788
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position790, tokenIndex790 := position, tokenIndex
if !_rules[ruleWS]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l790
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l791
l790:
position, tokenIndex = position790, tokenIndex790
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l791:
if !_rules[ruleRegisterOrConstant]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l788
}
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position792, tokenIndex792 := position, tokenIndex
if !_rules[ruleWS]() {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l792
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l793
l792:
position, tokenIndex = position792, tokenIndex792
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l793:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position794, tokenIndex794 := position, tokenIndex
if buffer[position] != rune(',') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l794
}
position++
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l794
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l796:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position797, tokenIndex797 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l797
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l796
l797:
position, tokenIndex = position797, tokenIndex797
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l795
l794:
position, tokenIndex = position794, tokenIndex794
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l795:
goto l789
l788:
position, tokenIndex = position788, tokenIndex788
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l789:
if buffer[position] != rune(')') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l782
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
add(ruleBaseIndexScale, position783)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l782:
position, tokenIndex = position782, tokenIndex782
return false
},
/* 50 Operator <- <('+' / '-')> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position798, tokenIndex798 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position799 := position
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position800, tokenIndex800 := position, tokenIndex
if buffer[position] != rune('+') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l801
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l800
l801:
position, tokenIndex = position800, tokenIndex800
if buffer[position] != rune('-') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l798
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l800:
add(ruleOperator, position799)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l798:
position, tokenIndex = position798, tokenIndex798
return false
},
/* 51 Offset <- <('+'? '-'? (('0' ('b' / 'B') ('0' / '1')+) / ('0' ('x' / 'X') ([0-9] / [0-9] / ([a-f] / [A-F]))+) / [0-9]+))> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position802, tokenIndex802 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position803 := position
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position804, tokenIndex804 := position, tokenIndex
if buffer[position] != rune('+') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l804
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l805
l804:
position, tokenIndex = position804, tokenIndex804
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l805:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position806, tokenIndex806 := position, tokenIndex
if buffer[position] != rune('-') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l806
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l807
l806:
position, tokenIndex = position806, tokenIndex806
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l807:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position808, tokenIndex808 := position, tokenIndex
if buffer[position] != rune('0') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l809
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position810, tokenIndex810 := position, tokenIndex
if buffer[position] != rune('b') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l811
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l810
l811:
position, tokenIndex = position810, tokenIndex810
if buffer[position] != rune('B') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l809
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l810:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position814, tokenIndex814 := position, tokenIndex
if buffer[position] != rune('0') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l815
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l814
l815:
position, tokenIndex = position814, tokenIndex814
if buffer[position] != rune('1') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l809
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l814:
l812:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position813, tokenIndex813 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position816, tokenIndex816 := position, tokenIndex
if buffer[position] != rune('0') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l817
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l816
l817:
position, tokenIndex = position816, tokenIndex816
if buffer[position] != rune('1') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l813
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l816:
goto l812
l813:
position, tokenIndex = position813, tokenIndex813
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l808
l809:
position, tokenIndex = position808, tokenIndex808
if buffer[position] != rune('0') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l818
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position819, tokenIndex819 := position, tokenIndex
if buffer[position] != rune('x') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l820
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l819
l820:
position, tokenIndex = position819, tokenIndex819
if buffer[position] != rune('X') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l818
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l819:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position823, tokenIndex823 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l824
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l823
l824:
position, tokenIndex = position823, tokenIndex823
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l825
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l823
l825:
position, tokenIndex = position823, tokenIndex823
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position826, tokenIndex826 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('f') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l827
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l826
l827:
position, tokenIndex = position826, tokenIndex826
if c := buffer[position]; c < rune('A') || c > rune('F') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l818
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l826:
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l823:
l821:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position822, tokenIndex822 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position828, tokenIndex828 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l829
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l828
l829:
position, tokenIndex = position828, tokenIndex828
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l830
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l828
l830:
position, tokenIndex = position828, tokenIndex828
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position831, tokenIndex831 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('f') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l832
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l831
l832:
position, tokenIndex = position831, tokenIndex831
if c := buffer[position]; c < rune('A') || c > rune('F') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l822
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l831:
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l828:
goto l821
l822:
position, tokenIndex = position822, tokenIndex822
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l808
l818:
position, tokenIndex = position808, tokenIndex808
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l802
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l833:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position834, tokenIndex834 := position, tokenIndex
if c := buffer[position]; c < rune('0') || c > rune('9') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l834
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l833
l834:
position, tokenIndex = position834, tokenIndex834
}
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l808:
add(ruleOffset, position803)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l802:
position, tokenIndex = position802, tokenIndex802
return false
},
/* 52 Section <- <([a-z] / [A-Z] / '@')+> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position835, tokenIndex835 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position836 := position
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position839, tokenIndex839 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l840
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l839
l840:
position, tokenIndex = position839, tokenIndex839
if c := buffer[position]; c < rune('A') || c > rune('Z') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l841
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l839
l841:
position, tokenIndex = position839, tokenIndex839
if buffer[position] != rune('@') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l835
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l839:
l837:
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position838, tokenIndex838 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position842, tokenIndex842 := position, tokenIndex
if c := buffer[position]; c < rune('a') || c > rune('z') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l843
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l842
l843:
position, tokenIndex = position842, tokenIndex842
if c := buffer[position]; c < rune('A') || c > rune('Z') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l844
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l842
l844:
position, tokenIndex = position842, tokenIndex842
if buffer[position] != rune('@') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l838
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l842:
goto l837
l838:
position, tokenIndex = position838, tokenIndex838
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
add(ruleSection, position836)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l835:
position, tokenIndex = position835, tokenIndex835
return false
},
/* 53 SegmentRegister <- <('%' ([c-g] / 's') ('s' ':'))> */
func() bool {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position845, tokenIndex845 := position, tokenIndex
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position846 := position
if buffer[position] != rune('%') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l845
}
position++
{
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
position847, tokenIndex847 := position, tokenIndex
if c := buffer[position]; c < rune('c') || c > rune('g') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l848
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l847
l848:
position, tokenIndex = position847, tokenIndex847
if buffer[position] != rune('s') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l845
}
position++
}
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l847:
if buffer[position] != rune('s') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l845
}
position++
if buffer[position] != rune(':') {
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
goto l845
}
position++
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
add(ruleSegmentRegister, position846)
}
return true
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
l845:
position, tokenIndex = position845, tokenIndex845
return false
},
}
p.rules = _rules
P-256 assembly optimisations for Aarch64. The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c. Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions: - ecp_nistz256_select_w5 - ecp_nistz256_select_w7 An implementation for these functions was added. Summary of modifications to the imported code: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`. * `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build. * l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform. Other modifications: * `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. * The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. * Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions. * Regenerated `delocate.peg.go`. Notes: 1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1 2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release: Before: ``` Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec) Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec) Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec) ``` After: ``` Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec) Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec) Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec) ``` Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>
3 years ago
return nil
}