|
|
|
/*
|
|
|
|
* Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
|
|
|
|
* Copyright (c) 2014, Intel Corporation. All Rights Reserved.
|
|
|
|
*
|
|
|
|
* Licensed under the OpenSSL license (the "License"). You may not use
|
|
|
|
* this file except in compliance with the License. You can obtain a copy
|
|
|
|
* in the file LICENSE in the source distribution or at
|
|
|
|
* https://www.openssl.org/source/license.html
|
|
|
|
*
|
|
|
|
* Originally written by Shay Gueron (1, 2), and Vlad Krasnov (1)
|
|
|
|
* (1) Intel Corporation, Israel Development Center, Haifa, Israel
|
|
|
|
* (2) University of Haifa, Israel
|
|
|
|
*
|
|
|
|
* Reference:
|
|
|
|
* S.Gueron and V.Krasnov, "Fast Prime Field Elliptic Curve Cryptography with
|
|
|
|
* 256 Bit Primes"
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <openssl/ec.h>
|
|
|
|
|
|
|
|
#include <assert.h>
|
|
|
|
#include <stdint.h>
|
|
|
|
#include <string.h>
|
|
|
|
|
|
|
|
#include <openssl/bn.h>
|
|
|
|
#include <openssl/crypto.h>
|
|
|
|
#include <openssl/err.h>
|
|
|
|
|
|
|
|
#include "../bn/internal.h"
|
|
|
|
#include "../delocate.h"
|
|
|
|
#include "../../internal.h"
|
|
|
|
#include "internal.h"
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#include "p256-nistz.h"
|
|
|
|
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#if !defined(OPENSSL_NO_ASM) && \
|
|
|
|
(defined(OPENSSL_X86_64) || defined(OPENSSL_AARCH64)) && \
|
|
|
|
!defined(OPENSSL_SMALL)
|
|
|
|
|
|
|
|
typedef P256_POINT_AFFINE PRECOMP256_ROW[64];
|
|
|
|
|
|
|
|
// One converted into the Montgomery domain
|
|
|
|
static const BN_ULONG ONE[P256_LIMBS] = {
|
|
|
|
TOBN(0x00000000, 0x00000001), TOBN(0xffffffff, 0x00000000),
|
|
|
|
TOBN(0xffffffff, 0xffffffff), TOBN(0x00000000, 0xfffffffe),
|
|
|
|
};
|
|
|
|
|
|
|
|
// Precomputed tables for the default generator
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#include "p256-nistz-table.h"
|
|
|
|
|
|
|
|
// Recode window to a signed digit, see |ec_GFp_nistp_recode_scalar_bits| in
|
|
|
|
// util.c for details
|
|
|
|
static crypto_word_t booth_recode_w5(crypto_word_t in) {
|
|
|
|
crypto_word_t s, d;
|
|
|
|
|
|
|
|
s = ~((in >> 5) - 1);
|
|
|
|
d = (1 << 6) - in - 1;
|
|
|
|
d = (d & s) | (in & ~s);
|
|
|
|
d = (d >> 1) + (d & 1);
|
|
|
|
|
|
|
|
return (d << 1) + (s & 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static crypto_word_t booth_recode_w7(crypto_word_t in) {
|
|
|
|
crypto_word_t s, d;
|
|
|
|
|
|
|
|
s = ~((in >> 7) - 1);
|
|
|
|
d = (1 << 8) - in - 1;
|
|
|
|
d = (d & s) | (in & ~s);
|
|
|
|
d = (d >> 1) + (d & 1);
|
|
|
|
|
|
|
|
return (d << 1) + (s & 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
// copy_conditional copies |src| to |dst| if |move| is one and leaves it as-is
|
|
|
|
// if |move| is zero.
|
|
|
|
//
|
|
|
|
// WARNING: this breaks the usual convention of constant-time functions
|
|
|
|
// returning masks.
|
|
|
|
static void copy_conditional(BN_ULONG dst[P256_LIMBS],
|
|
|
|
const BN_ULONG src[P256_LIMBS], BN_ULONG move) {
|
|
|
|
BN_ULONG mask1 = ((BN_ULONG)0) - move;
|
|
|
|
BN_ULONG mask2 = ~mask1;
|
|
|
|
|
|
|
|
dst[0] = (src[0] & mask1) ^ (dst[0] & mask2);
|
|
|
|
dst[1] = (src[1] & mask1) ^ (dst[1] & mask2);
|
|
|
|
dst[2] = (src[2] & mask1) ^ (dst[2] & mask2);
|
|
|
|
dst[3] = (src[3] & mask1) ^ (dst[3] & mask2);
|
|
|
|
if (P256_LIMBS == 8) {
|
|
|
|
dst[4] = (src[4] & mask1) ^ (dst[4] & mask2);
|
|
|
|
dst[5] = (src[5] & mask1) ^ (dst[5] & mask2);
|
|
|
|
dst[6] = (src[6] & mask1) ^ (dst[6] & mask2);
|
|
|
|
dst[7] = (src[7] & mask1) ^ (dst[7] & mask2);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// is_not_zero returns one iff in != 0 and zero otherwise.
|
|
|
|
//
|
|
|
|
// WARNING: this breaks the usual convention of constant-time functions
|
|
|
|
// returning masks.
|
|
|
|
//
|
|
|
|
// (define-fun is_not_zero ((in (_ BitVec 64))) (_ BitVec 64)
|
|
|
|
// (bvlshr (bvor in (bvsub #x0000000000000000 in)) #x000000000000003f)
|
|
|
|
// )
|
|
|
|
//
|
|
|
|
// (declare-fun x () (_ BitVec 64))
|
|
|
|
//
|
|
|
|
// (assert (and (= x #x0000000000000000) (= (is_not_zero x) #x0000000000000001)))
|
|
|
|
// (check-sat)
|
|
|
|
//
|
|
|
|
// (assert (and (not (= x #x0000000000000000)) (= (is_not_zero x) #x0000000000000000)))
|
|
|
|
// (check-sat)
|
|
|
|
//
|
|
|
|
static BN_ULONG is_not_zero(BN_ULONG in) {
|
|
|
|
in |= (0 - in);
|
|
|
|
in >>= BN_BITS2 - 1;
|
|
|
|
return in;
|
|
|
|
}
|
|
|
|
|
|
|
|
// ecp_nistz256_mod_inverse_sqr_mont sets |r| to (|in| * 2^-256)^-2 * 2^256 mod
|
|
|
|
// p. That is, |r| is the modular inverse square of |in| for input and output in
|
|
|
|
// the Montgomery domain.
|
|
|
|
static void ecp_nistz256_mod_inverse_sqr_mont(BN_ULONG r[P256_LIMBS],
|
|
|
|
const BN_ULONG in[P256_LIMBS]) {
|
|
|
|
// This implements the addition chain described in
|
|
|
|
// https://briansmith.org/ecc-inversion-addition-chains-01#p256_field_inversion
|
|
|
|
BN_ULONG x2[P256_LIMBS], x3[P256_LIMBS], x6[P256_LIMBS], x12[P256_LIMBS],
|
|
|
|
x15[P256_LIMBS], x30[P256_LIMBS], x32[P256_LIMBS];
|
|
|
|
ecp_nistz256_sqr_mont(x2, in); // 2^2 - 2^1
|
|
|
|
ecp_nistz256_mul_mont(x2, x2, in); // 2^2 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x3, x2); // 2^3 - 2^1
|
|
|
|
ecp_nistz256_mul_mont(x3, x3, in); // 2^3 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x6, x3);
|
|
|
|
for (int i = 1; i < 3; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(x6, x6);
|
|
|
|
} // 2^6 - 2^3
|
|
|
|
ecp_nistz256_mul_mont(x6, x6, x3); // 2^6 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x12, x6);
|
|
|
|
for (int i = 1; i < 6; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(x12, x12);
|
|
|
|
} // 2^12 - 2^6
|
|
|
|
ecp_nistz256_mul_mont(x12, x12, x6); // 2^12 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x15, x12);
|
|
|
|
for (int i = 1; i < 3; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(x15, x15);
|
|
|
|
} // 2^15 - 2^3
|
|
|
|
ecp_nistz256_mul_mont(x15, x15, x3); // 2^15 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x30, x15);
|
|
|
|
for (int i = 1; i < 15; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(x30, x30);
|
|
|
|
} // 2^30 - 2^15
|
|
|
|
ecp_nistz256_mul_mont(x30, x30, x15); // 2^30 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(x32, x30);
|
|
|
|
ecp_nistz256_sqr_mont(x32, x32); // 2^32 - 2^2
|
|
|
|
ecp_nistz256_mul_mont(x32, x32, x2); // 2^32 - 2^0
|
|
|
|
|
|
|
|
BN_ULONG ret[P256_LIMBS];
|
|
|
|
ecp_nistz256_sqr_mont(ret, x32);
|
|
|
|
for (int i = 1; i < 31 + 1; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(ret, ret);
|
|
|
|
} // 2^64 - 2^32
|
|
|
|
ecp_nistz256_mul_mont(ret, ret, in); // 2^64 - 2^32 + 2^0
|
|
|
|
|
|
|
|
for (int i = 0; i < 96 + 32; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(ret, ret);
|
|
|
|
} // 2^192 - 2^160 + 2^128
|
|
|
|
ecp_nistz256_mul_mont(ret, ret, x32); // 2^192 - 2^160 + 2^128 + 2^32 - 2^0
|
|
|
|
|
|
|
|
for (int i = 0; i < 32; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(ret, ret);
|
|
|
|
} // 2^224 - 2^192 + 2^160 + 2^64 - 2^32
|
|
|
|
ecp_nistz256_mul_mont(ret, ret, x32); // 2^224 - 2^192 + 2^160 + 2^64 - 2^0
|
|
|
|
|
|
|
|
for (int i = 0; i < 30; i++) {
|
|
|
|
ecp_nistz256_sqr_mont(ret, ret);
|
|
|
|
} // 2^254 - 2^222 + 2^190 + 2^94 - 2^30
|
|
|
|
ecp_nistz256_mul_mont(ret, ret, x30); // 2^254 - 2^222 + 2^190 + 2^94 - 2^0
|
|
|
|
|
|
|
|
ecp_nistz256_sqr_mont(ret, ret);
|
|
|
|
ecp_nistz256_sqr_mont(r, ret); // 2^256 - 2^224 + 2^192 + 2^96 - 2^2
|
|
|
|
}
|
|
|
|
|
|
|
|
// r = p * p_scalar
|
|
|
|
static void ecp_nistz256_windowed_mul(const EC_GROUP *group, P256_POINT *r,
|
|
|
|
const EC_RAW_POINT *p,
|
|
|
|
const EC_SCALAR *p_scalar) {
|
|
|
|
assert(p != NULL);
|
|
|
|
assert(p_scalar != NULL);
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
|
|
|
|
static const size_t kWindowSize = 5;
|
|
|
|
static const crypto_word_t kMask = (1 << (5 /* kWindowSize */ + 1)) - 1;
|
|
|
|
|
|
|
|
// A |P256_POINT| is (3 * 32) = 96 bytes, and the 64-byte alignment should
|
|
|
|
// add no more than 63 bytes of overhead. Thus, |table| should require
|
|
|
|
// ~1599 ((96 * 16) + 63) bytes of stack space.
|
|
|
|
alignas(64) P256_POINT table[16];
|
|
|
|
uint8_t p_str[33];
|
Remove unions in EC_SCALAR and EC_FELEM.
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
3 years ago
|
|
|
OPENSSL_memcpy(p_str, p_scalar->words, 32);
|
|
|
|
p_str[32] = 0;
|
|
|
|
|
|
|
|
// table[0] is implicitly (0,0,0) (the point at infinity), therefore it is
|
|
|
|
// not stored. All other values are actually stored with an offset of -1 in
|
|
|
|
// table.
|
|
|
|
P256_POINT *row = table;
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
OPENSSL_memcpy(row[1 - 1].X, p->X.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(row[1 - 1].Y, p->Y.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(row[1 - 1].Z, p->Z.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
|
|
|
|
ecp_nistz256_point_double(&row[2 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[3 - 1], &row[2 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[4 - 1], &row[2 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[6 - 1], &row[3 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[8 - 1], &row[4 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[12 - 1], &row[6 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[5 - 1], &row[4 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[7 - 1], &row[6 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[9 - 1], &row[8 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[13 - 1], &row[12 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[14 - 1], &row[7 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[10 - 1], &row[5 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[15 - 1], &row[14 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_add(&row[11 - 1], &row[10 - 1], &row[1 - 1]);
|
|
|
|
ecp_nistz256_point_double(&row[16 - 1], &row[8 - 1]);
|
|
|
|
|
|
|
|
BN_ULONG tmp[P256_LIMBS];
|
|
|
|
alignas(32) P256_POINT h;
|
|
|
|
size_t index = 255;
|
|
|
|
crypto_word_t wvalue = p_str[(index - 1) / 8];
|
|
|
|
wvalue = (wvalue >> ((index - 1) % 8)) & kMask;
|
|
|
|
|
|
|
|
ecp_nistz256_select_w5(r, table, booth_recode_w5(wvalue) >> 1);
|
|
|
|
|
|
|
|
while (index >= 5) {
|
|
|
|
if (index != 255) {
|
|
|
|
size_t off = (index - 1) / 8;
|
|
|
|
|
|
|
|
wvalue = (crypto_word_t)p_str[off] | (crypto_word_t)p_str[off + 1] << 8;
|
|
|
|
wvalue = (wvalue >> ((index - 1) % 8)) & kMask;
|
|
|
|
|
|
|
|
wvalue = booth_recode_w5(wvalue);
|
|
|
|
|
|
|
|
ecp_nistz256_select_w5(&h, table, wvalue >> 1);
|
|
|
|
|
|
|
|
ecp_nistz256_neg(tmp, h.Y);
|
|
|
|
copy_conditional(h.Y, tmp, (wvalue & 1));
|
|
|
|
|
|
|
|
ecp_nistz256_point_add(r, r, &h);
|
|
|
|
}
|
|
|
|
|
|
|
|
index -= kWindowSize;
|
|
|
|
|
|
|
|
ecp_nistz256_point_double(r, r);
|
|
|
|
ecp_nistz256_point_double(r, r);
|
|
|
|
ecp_nistz256_point_double(r, r);
|
|
|
|
ecp_nistz256_point_double(r, r);
|
|
|
|
ecp_nistz256_point_double(r, r);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Final window
|
|
|
|
wvalue = p_str[0];
|
|
|
|
wvalue = (wvalue << 1) & kMask;
|
|
|
|
|
|
|
|
wvalue = booth_recode_w5(wvalue);
|
|
|
|
|
|
|
|
ecp_nistz256_select_w5(&h, table, wvalue >> 1);
|
|
|
|
|
|
|
|
ecp_nistz256_neg(tmp, h.Y);
|
|
|
|
copy_conditional(h.Y, tmp, wvalue & 1);
|
|
|
|
|
|
|
|
ecp_nistz256_point_add(r, r, &h);
|
|
|
|
}
|
|
|
|
|
|
|
|
typedef union {
|
|
|
|
P256_POINT p;
|
|
|
|
P256_POINT_AFFINE a;
|
|
|
|
} p256_point_union_t;
|
|
|
|
|
|
|
|
static crypto_word_t calc_first_wvalue(size_t *index, const uint8_t p_str[33]) {
|
|
|
|
static const size_t kWindowSize = 7;
|
|
|
|
static const crypto_word_t kMask = (1 << (7 /* kWindowSize */ + 1)) - 1;
|
|
|
|
*index = kWindowSize;
|
|
|
|
|
|
|
|
crypto_word_t wvalue = (p_str[0] << 1) & kMask;
|
|
|
|
return booth_recode_w7(wvalue);
|
|
|
|
}
|
|
|
|
|
|
|
|
static crypto_word_t calc_wvalue(size_t *index, const uint8_t p_str[33]) {
|
|
|
|
static const size_t kWindowSize = 7;
|
|
|
|
static const crypto_word_t kMask = (1 << (7 /* kWindowSize */ + 1)) - 1;
|
|
|
|
|
|
|
|
const size_t off = (*index - 1) / 8;
|
|
|
|
crypto_word_t wvalue =
|
|
|
|
(crypto_word_t)p_str[off] | (crypto_word_t)p_str[off + 1] << 8;
|
|
|
|
wvalue = (wvalue >> ((*index - 1) % 8)) & kMask;
|
|
|
|
*index += kWindowSize;
|
|
|
|
|
|
|
|
return booth_recode_w7(wvalue);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_point_mul(const EC_GROUP *group, EC_RAW_POINT *r,
|
|
|
|
const EC_RAW_POINT *p,
|
|
|
|
const EC_SCALAR *scalar) {
|
|
|
|
alignas(32) P256_POINT out;
|
|
|
|
ecp_nistz256_windowed_mul(group, &out, p, scalar);
|
|
|
|
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
OPENSSL_memcpy(r->X.words, out.X, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Y.words, out.Y, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Z.words, out.Z, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_point_mul_base(const EC_GROUP *group, EC_RAW_POINT *r,
|
|
|
|
const EC_SCALAR *scalar) {
|
|
|
|
alignas(32) p256_point_union_t t, p;
|
|
|
|
|
|
|
|
uint8_t p_str[33];
|
Remove unions in EC_SCALAR and EC_FELEM.
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
3 years ago
|
|
|
OPENSSL_memcpy(p_str, scalar->words, 32);
|
|
|
|
p_str[32] = 0;
|
|
|
|
|
|
|
|
// First window
|
|
|
|
size_t index = 0;
|
|
|
|
crypto_word_t wvalue = calc_first_wvalue(&index, p_str);
|
|
|
|
|
|
|
|
ecp_nistz256_select_w7(&p.a, ecp_nistz256_precomputed[0], wvalue >> 1);
|
|
|
|
ecp_nistz256_neg(p.p.Z, p.p.Y);
|
|
|
|
copy_conditional(p.p.Y, p.p.Z, wvalue & 1);
|
|
|
|
|
|
|
|
// Convert |p| from affine to Jacobian coordinates. We set Z to zero if |p|
|
|
|
|
// is infinity and |ONE| otherwise. |p| was computed from the table, so it
|
|
|
|
// is infinity iff |wvalue >> 1| is zero.
|
|
|
|
OPENSSL_memset(p.p.Z, 0, sizeof(p.p.Z));
|
|
|
|
copy_conditional(p.p.Z, ONE, is_not_zero(wvalue >> 1));
|
|
|
|
|
|
|
|
for (int i = 1; i < 37; i++) {
|
|
|
|
wvalue = calc_wvalue(&index, p_str);
|
|
|
|
|
|
|
|
ecp_nistz256_select_w7(&t.a, ecp_nistz256_precomputed[i], wvalue >> 1);
|
|
|
|
|
|
|
|
ecp_nistz256_neg(t.p.Z, t.a.Y);
|
|
|
|
copy_conditional(t.a.Y, t.p.Z, wvalue & 1);
|
|
|
|
|
|
|
|
// Note |ecp_nistz256_point_add_affine| does not work if |p.p| and |t.a|
|
|
|
|
// are the same non-infinity point.
|
|
|
|
ecp_nistz256_point_add_affine(&p.p, &p.p, &t.a);
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
OPENSSL_memcpy(r->X.words, p.p.X, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Y.words, p.p.Y, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Z.words, p.p.Z, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_points_mul_public(const EC_GROUP *group,
|
|
|
|
EC_RAW_POINT *r,
|
|
|
|
const EC_SCALAR *g_scalar,
|
|
|
|
const EC_RAW_POINT *p_,
|
|
|
|
const EC_SCALAR *p_scalar) {
|
|
|
|
assert(p_ != NULL && p_scalar != NULL && g_scalar != NULL);
|
|
|
|
|
|
|
|
alignas(32) p256_point_union_t t, p;
|
|
|
|
uint8_t p_str[33];
|
Remove unions in EC_SCALAR and EC_FELEM.
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
3 years ago
|
|
|
OPENSSL_memcpy(p_str, g_scalar->words, 32);
|
|
|
|
p_str[32] = 0;
|
|
|
|
|
|
|
|
// First window
|
|
|
|
size_t index = 0;
|
|
|
|
size_t wvalue = calc_first_wvalue(&index, p_str);
|
|
|
|
|
|
|
|
// Convert |p| from affine to Jacobian coordinates. We set Z to zero if |p|
|
|
|
|
// is infinity and |ONE| otherwise. |p| was computed from the table, so it
|
|
|
|
// is infinity iff |wvalue >> 1| is zero.
|
|
|
|
if ((wvalue >> 1) != 0) {
|
|
|
|
OPENSSL_memcpy(&p.a, &ecp_nistz256_precomputed[0][(wvalue >> 1) - 1],
|
|
|
|
sizeof(p.a));
|
|
|
|
OPENSSL_memcpy(&p.p.Z, ONE, sizeof(p.p.Z));
|
|
|
|
} else {
|
|
|
|
OPENSSL_memset(&p.a, 0, sizeof(p.a));
|
|
|
|
OPENSSL_memset(p.p.Z, 0, sizeof(p.p.Z));
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((wvalue & 1) == 1) {
|
|
|
|
ecp_nistz256_neg(p.p.Y, p.p.Y);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (int i = 1; i < 37; i++) {
|
|
|
|
wvalue = calc_wvalue(&index, p_str);
|
|
|
|
|
|
|
|
if ((wvalue >> 1) == 0) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
OPENSSL_memcpy(&t.a, &ecp_nistz256_precomputed[i][(wvalue >> 1) - 1],
|
|
|
|
sizeof(p.a));
|
|
|
|
|
|
|
|
if ((wvalue & 1) == 1) {
|
|
|
|
ecp_nistz256_neg(t.a.Y, t.a.Y);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Note |ecp_nistz256_point_add_affine| does not work if |p.p| and |t.a|
|
|
|
|
// are the same non-infinity point, so it is important that we compute the
|
|
|
|
// |g_scalar| term before the |p_scalar| term.
|
|
|
|
ecp_nistz256_point_add_affine(&p.p, &p.p, &t.a);
|
|
|
|
}
|
|
|
|
|
|
|
|
ecp_nistz256_windowed_mul(group, &t.p, p_, p_scalar);
|
|
|
|
ecp_nistz256_point_add(&p.p, &p.p, &t.p);
|
|
|
|
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
OPENSSL_memcpy(r->X.words, p.p.X, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Y.words, p.p.Y, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Z.words, p.p.Z, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int ecp_nistz256_get_affine(const EC_GROUP *group,
|
|
|
|
const EC_RAW_POINT *point, EC_FELEM *x,
|
|
|
|
EC_FELEM *y) {
|
|
|
|
if (ec_GFp_simple_is_at_infinity(group, point)) {
|
|
|
|
OPENSSL_PUT_ERROR(EC, EC_R_POINT_AT_INFINITY);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
BN_ULONG z_inv2[P256_LIMBS];
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
ecp_nistz256_mod_inverse_sqr_mont(z_inv2, point->Z.words);
|
|
|
|
|
|
|
|
if (x != NULL) {
|
|
|
|
ecp_nistz256_mul_mont(x->words, z_inv2, point->X.words);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (y != NULL) {
|
|
|
|
ecp_nistz256_sqr_mont(z_inv2, z_inv2); // z^-4
|
|
|
|
ecp_nistz256_mul_mont(y->words, point->Y.words, point->Z.words); // y * z
|
|
|
|
ecp_nistz256_mul_mont(y->words, y->words, z_inv2); // y * z^-3
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_add(const EC_GROUP *group, EC_RAW_POINT *r,
|
|
|
|
const EC_RAW_POINT *a_, const EC_RAW_POINT *b_) {
|
|
|
|
P256_POINT a, b;
|
|
|
|
OPENSSL_memcpy(a.X, a_->X.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(a.Y, a_->Y.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(a.Z, a_->Z.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(b.X, b_->X.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(b.Y, b_->Y.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(b.Z, b_->Z.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
ecp_nistz256_point_add(&a, &a, &b);
|
|
|
|
OPENSSL_memcpy(r->X.words, a.X, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Y.words, a.Y, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Z.words, a.Z, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_dbl(const EC_GROUP *group, EC_RAW_POINT *r,
|
|
|
|
const EC_RAW_POINT *a_) {
|
|
|
|
P256_POINT a;
|
|
|
|
OPENSSL_memcpy(a.X, a_->X.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(a.Y, a_->Y.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(a.Z, a_->Z.words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
ecp_nistz256_point_double(&a, &a);
|
|
|
|
OPENSSL_memcpy(r->X.words, a.X, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Y.words, a.Y, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
OPENSSL_memcpy(r->Z.words, a.Z, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ecp_nistz256_inv0_mod_ord(const EC_GROUP *group, EC_SCALAR *out,
|
|
|
|
const EC_SCALAR *in) {
|
|
|
|
// table[i] stores a power of |in| corresponding to the matching enum value.
|
|
|
|
enum {
|
|
|
|
// The following indices specify the power in binary.
|
|
|
|
i_1 = 0,
|
|
|
|
i_10,
|
|
|
|
i_11,
|
|
|
|
i_101,
|
|
|
|
i_111,
|
|
|
|
i_1010,
|
|
|
|
i_1111,
|
|
|
|
i_10101,
|
|
|
|
i_101010,
|
|
|
|
i_101111,
|
|
|
|
// The following indices specify 2^N-1, or N ones in a row.
|
|
|
|
i_x6,
|
|
|
|
i_x8,
|
|
|
|
i_x16,
|
|
|
|
i_x32
|
|
|
|
};
|
|
|
|
BN_ULONG table[15][P256_LIMBS];
|
|
|
|
|
|
|
|
// https://briansmith.org/ecc-inversion-addition-chains-01#p256_scalar_inversion
|
|
|
|
//
|
|
|
|
// Even though this code path spares 12 squarings, 4.5%, and 13
|
|
|
|
// multiplications, 25%, the overall sign operation is not that much faster,
|
|
|
|
// not more that 2%. Most of the performance of this function comes from the
|
|
|
|
// scalar operations.
|
|
|
|
|
|
|
|
// Pre-calculate powers.
|
|
|
|
OPENSSL_memcpy(table[i_1], in->words, P256_LIMBS * sizeof(BN_ULONG));
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_10], table[i_1], 1);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_11], table[i_1], table[i_10]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_101], table[i_11], table[i_10]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_111], table[i_101], table[i_10]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_1010], table[i_101], 1);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_1111], table[i_1010], table[i_101]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_10101], table[i_1010], 1);
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_10101], table[i_10101], table[i_1]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_101010], table[i_10101], 1);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_101111], table[i_101010], table[i_101]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_x6], table[i_101010], table[i_10101]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_x8], table[i_x6], 2);
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_x8], table[i_x8], table[i_11]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_x16], table[i_x8], 8);
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_x16], table[i_x16], table[i_x8]);
|
|
|
|
|
|
|
|
ecp_nistz256_ord_sqr_mont(table[i_x32], table[i_x16], 16);
|
|
|
|
ecp_nistz256_ord_mul_mont(table[i_x32], table[i_x32], table[i_x16]);
|
|
|
|
|
|
|
|
// Compute |in| raised to the order-2.
|
|
|
|
ecp_nistz256_ord_sqr_mont(out->words, table[i_x32], 64);
|
|
|
|
ecp_nistz256_ord_mul_mont(out->words, out->words, table[i_x32]);
|
|
|
|
static const struct {
|
|
|
|
uint8_t p, i;
|
|
|
|
} kChain[27] = {{32, i_x32}, {6, i_101111}, {5, i_111}, {4, i_11},
|
|
|
|
{5, i_1111}, {5, i_10101}, {4, i_101}, {3, i_101},
|
|
|
|
{3, i_101}, {5, i_111}, {9, i_101111}, {6, i_1111},
|
|
|
|
{2, i_1}, {5, i_1}, {6, i_1111}, {5, i_111},
|
|
|
|
{4, i_111}, {5, i_111}, {5, i_101}, {3, i_11},
|
|
|
|
{10, i_101111}, {2, i_11}, {5, i_11}, {5, i_11},
|
|
|
|
{3, i_1}, {7, i_10101}, {6, i_1111}};
|
|
|
|
for (size_t i = 0; i < OPENSSL_ARRAY_SIZE(kChain); i++) {
|
|
|
|
ecp_nistz256_ord_sqr_mont(out->words, out->words, kChain[i].p);
|
|
|
|
ecp_nistz256_ord_mul_mont(out->words, out->words, table[kChain[i].i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int ecp_nistz256_scalar_to_montgomery_inv_vartime(const EC_GROUP *group,
|
|
|
|
EC_SCALAR *out,
|
|
|
|
const EC_SCALAR *in) {
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#if defined(OPENSSL_X86_64)
|
Check static CPU capabilities on x86.
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
3 years ago
|
|
|
if (!CRYPTO_is_AVX_capable()) {
|
|
|
|
// No AVX support; fallback to generic code.
|
|
|
|
return ec_simple_scalar_to_montgomery_inv_vartime(group, out, in);
|
|
|
|
}
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#endif
|
|
|
|
|
|
|
|
assert(group->order.width == P256_LIMBS);
|
|
|
|
if (!beeu_mod_inverse_vartime(out->words, in->words, group->order.d)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// The result should be returned in the Montgomery domain.
|
|
|
|
ec_scalar_to_montgomery(group, out, out);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int ecp_nistz256_cmp_x_coordinate(const EC_GROUP *group,
|
|
|
|
const EC_RAW_POINT *p,
|
|
|
|
const EC_SCALAR *r) {
|
|
|
|
if (ec_GFp_simple_is_at_infinity(group, p)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(group->order.width == P256_LIMBS);
|
|
|
|
assert(group->field.width == P256_LIMBS);
|
|
|
|
|
|
|
|
// We wish to compare X/Z^2 with r. This is equivalent to comparing X with
|
|
|
|
// r*Z^2. Note that X and Z are represented in Montgomery form, while r is
|
|
|
|
// not.
|
|
|
|
BN_ULONG r_Z2[P256_LIMBS], Z2_mont[P256_LIMBS], X[P256_LIMBS];
|
|
|
|
ecp_nistz256_mul_mont(Z2_mont, p->Z.words, p->Z.words);
|
|
|
|
ecp_nistz256_mul_mont(r_Z2, r->words, Z2_mont);
|
|
|
|
ecp_nistz256_from_mont(X, p->X.words);
|
|
|
|
|
|
|
|
if (OPENSSL_memcmp(r_Z2, X, sizeof(r_Z2)) == 0) {
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
// During signing the x coefficient is reduced modulo the group order.
|
|
|
|
// Therefore there is a small possibility, less than 1/2^128, that group_order
|
|
|
|
// < p.x < P. in that case we need not only to compare against |r| but also to
|
|
|
|
// compare against r+group_order.
|
|
|
|
if (bn_less_than_words(r->words, group->field_minus_order.words,
|
|
|
|
P256_LIMBS)) {
|
|
|
|
// We can ignore the carry because: r + group_order < p < 2^256.
|
|
|
|
bn_add_words(r_Z2, r->words, group->order.d, P256_LIMBS);
|
|
|
|
ecp_nistz256_mul_mont(r_Z2, r_Z2, Z2_mont);
|
|
|
|
if (OPENSSL_memcmp(r_Z2, X, sizeof(r_Z2)) == 0) {
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_METHOD_FUNCTION(EC_METHOD, EC_GFp_nistz256_method) {
|
|
|
|
out->group_init = ec_GFp_mont_group_init;
|
|
|
|
out->group_finish = ec_GFp_mont_group_finish;
|
|
|
|
out->group_set_curve = ec_GFp_mont_group_set_curve;
|
|
|
|
out->point_get_affine_coordinates = ecp_nistz256_get_affine;
|
|
|
|
out->add = ecp_nistz256_add;
|
|
|
|
out->dbl = ecp_nistz256_dbl;
|
|
|
|
out->mul = ecp_nistz256_point_mul;
|
|
|
|
out->mul_base = ecp_nistz256_point_mul_base;
|
|
|
|
out->mul_public = ecp_nistz256_points_mul_public;
|
|
|
|
out->felem_mul = ec_GFp_mont_felem_mul;
|
|
|
|
out->felem_sqr = ec_GFp_mont_felem_sqr;
|
|
|
|
out->felem_to_bytes = ec_GFp_mont_felem_to_bytes;
|
|
|
|
out->felem_from_bytes = ec_GFp_mont_felem_from_bytes;
|
|
|
|
out->scalar_inv0_montgomery = ecp_nistz256_inv0_mod_ord;
|
|
|
|
out->scalar_to_montgomery_inv_vartime =
|
|
|
|
ecp_nistz256_scalar_to_montgomery_inv_vartime;
|
|
|
|
out->cmp_x_coordinate = ecp_nistz256_cmp_x_coordinate;
|
|
|
|
}
|
|
|
|
|
P-256 assembly optimisations for Aarch64.
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at https://github.com/openssl/openssl/blob/19e277dd19f2897f6a7b7eb236abe46655e575bf/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 years ago
|
|
|
#endif /* !defined(OPENSSL_NO_ASM) && \
|
|
|
|
(defined(OPENSSL_X86_64) || defined(OPENSSL_AARCH64)) && \
|
|
|
|
!defined(OPENSSL_SMALL) */
|