I'm not sure what the history of this is, but it seems to work
just fine in MSVC now.
Change-Id: Iebdc365486bb30a61a1001f705aef7dcaa2a9fcd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52985
Reviewed-by: Adam Langley <agl@google.com>
bind uses this function.
Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
I don't think these are all UB by C's rules, but it's easier not to
think about the pointers. Still more to go, but these were some easy
ones.
Bug: 301
Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a
tmp.width = top line. Without it, tmp.width was set by
bn_one_to_montgomery. Since we always size modular arithmetic by the
modulus, tmp.width (and am.width) will actually always be top, and
there's actually no need to zero pad it.
We don't capture this in the type system or BIGNUM width convention, so
better to set the width explicitly. The original code did it at the end,
but I think doing it right when we zero pad it is better, as that's when
the size gets set.
But we can go a step further. The manual zero padding code came from
OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are
resizable, so just call bn_resize_words, immediately after the
computation.
(bn_resize_words will not reallocate the data because the BIGNUMs have
the STATIC_DATA flag set. bn_wexpand will internally allow expanding up
to dmax, or top.)
Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(It's a pain because we don't have it setup in CMake, but perhaps we
should have a builder for the configuration that doesn't have bcm.c.)
Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Both implementations need to compute the first 32 powers of a. There's a
commented out naive version in rsaz_exp.c that claims to be smaller, but
1% slower. (It doesn't use squares when it otherwise could.)
Instead, we can write out the square-based strategy as a loop. (I wasn't
able to measure a difference between any of the three versions, but this
one's compact enough and does let us square more and gather5 less.)
Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The unrolled loops appear to have negligible perf impact:
Before:
Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec)
Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec)
After:
Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%]
Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%]
Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
RSAZ has a very similar bug to mont5 from
https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may
return the modulus when it should return zero. As in that CL, there is
no security impact on our cryptographic primitives.
RSAZ is described in the paper "Software Implementation of Modular
Exponentiation, Using Advanced Vector Instructions Architectures".
The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery
Multiplication". This is like normal Montgomery multiplication, but
skips the final subtraction altogether (whereas mont5's AMM still
subtracts, but replaces MM's tigher bound with just the carry bit). This
would normally not be stable, but RSAZ picks a larger R > 4M, and
maintains looser bounds for modular arithmetic, a < 2M.
Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound.
It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery
form with NRMM is fully reduced. This second claim is wrong. The proof
shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not
NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see
Figure 7 in the paper).
Thus, like mont5 with AMM, RSAZ may return the modulus when it should
return zero. Fix this by adding a bn_reduce_once_in_place call at the
end of the operation.
Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath. That will be
fixed in the subsequent CL. (See the commented out test.)
The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions. BoringSSL
calls BN_mod_exp_mont_consttime in the following cases:
- RSA private key operations
- Primality testing, raising the witness to the odd part of p-1
- DSA keygen and key import, pub = g^priv (mod p)
- DSA signing, r = g^k (mod p)
- DH keygen, pub = g^priv (mod p)
- Diffie-Hellman, secret = peer^priv (mod p)
It is not possible in the RSA private key operation, provided p and q
are primes. If using CRT, we are working modulo a prime, so zero output
with non-zero input is impossible. If not using CRT, we work mod n.
While there are nilpotent values mod n, none of them hit zero by
exponentiating. (Both p and q would need to divide the input, which
means n divides the input.)
In primality testing, this can only be hit when the input was composite.
But as the rest of the loop cannot then hit 1, we'll correctly report it
as composite anyway.
DSA and DH work modulo a prime, where this case cannot happen.
Analysis:
This bug is the result of sloppiness with the looser bounds from "almost
Montgomery multiplication", described in
https://eprint.iacr.org/2011/239. Prior to upstream's
ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl
implemented standard Montgomery reduction (the left half of figure 3 in
the paper).
Though it did not document this, ec9cc70f7245 changed it to implement
the "almost" variant (the right half of the figure.) The difference is
that, rather than subtracting if T >= m, it subtracts if T >= R. In
code, it is the difference between something like our bn_reduce_once,
vs. subtracting based only on T's carry bit. (Interestingly, the
.Lmul_enter branch of bn_mul_mont_gather5 seems to still implement
normal reduction, but the .Lmul4x_enter branch is an almost reduction.)
That means none of the intermediate values here are bounded by m. They
are only bounded by R. Accordingly, Figure 2 in the paper ends with
step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this
step. The bn_from_montgomery call only implements step 9, AMM(h, 1).
(x86_64-mont5.pl's bn_from_montgomery only implements an almost
reduction.)
The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the
paper discusses this, but is ambiguous about the scope of its 2^(n-1) <
m < 2^n precondition. The m+1 bound appears to be unconditional:
Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a
multiple of R, and then divides by R. The output, pre-subtraction, is
thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m.
A single subtraction of m if T' >= m gives T'' < m. AMM works because
T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R
gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper,
though their formulation is more complicated to capture the word-by-word
algorithm. It's ultimately the same adjustment to T.
But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That
is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced
is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions
step 10 isn't necessary because m is a prime and the inputs are
non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime
may be called elsewhere.
Fix:
To fix this, we could add the missing step 10, but a full division would
not be constant-time. The analysis above says it could be a single
subtraction, bn_reduce_once, but then we could integrate it into
the subtraction already in plain Montgomery reduction, implemented by
uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within
bounds.
Thus, we delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.
In doing so, add comments describing these looser bounds. This includes
one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont
(MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because
MM is AMM-compatible; when passed AMM's looser inputs, it will still
produce a correct looser output.
Ideally we'd drop the "almost" reduction and stick to the more
straightforward bounds. As this only impacts the final subtraction in
each reduction, I would be surprised if it actually had a real
performance impact. But this would involve deeper change to
x86_64-mont5.pl, so I haven't tried this yet.
I believe this is basically the same bug as
https://github.com/golang/go/issues/13907 from Go.
Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
For now, it contains a call to set the service indicator so it should
live in the module. In term it would be nice to move it back out and
have the service indicator set in RSA and ECDSA functions themselves
once the ECDSA functions can take an indicator of the hash function
used.
Change-Id: I2a3c262f66b1881a96ae3e49784a0dc9fc8c4589
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52705
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
If we're to have any hope of fixing EVP_CIPHER_CTX's calling convention, we
need to be able to change the shape of its method table.
Looking back, it looks like we exported this in
https://boringssl-review.googlesource.com/4330, for OpenSSH. I don't
remember exactly what OpenSSH was doing, but I see in this commit, they
removed a bunch of custom EVP_CIPHERs which would definitely have
required an exported EVP_CIPHER struct:
cdccebdf85
That's been gone for a while now, so hopefully we can hide it again. (If
a project needs a cipher not implemented by OpenSSL, it's not strictly
necessarily to make a custom EVP_CIPHER. It might be convenient to reuse
the abstraction, but you can always just call your own APIs directly.)
Update-Note: EVP_CIPHER is now opaque. Use accessors instead.
Bug: 494
Change-Id: I9344690c3cfe7d19d6ca12fb66484ced57dbe869
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52725
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
This is cribbed, with perimssion, from AWS-LC. The FIPS service
indicator[1] signals when an approved service has been completed.
[1] FIPS 140-3 IG 2.4.C
Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
strcasecmp is locale-sensitive, which can cause some mishaps.
This CL should be a no-op, because this call is only used on Android,
and bionic's strcasecmp seems to be ASCII-only. But using
OPENSSL_strcasecmp everywhere is easier to reason about.
Change-Id: Iecf9bc4da1bb3a4ab87b1e8b1d7f6f6c6e44aceb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52305
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at 19e277dd19/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
There are paperwork reasons why it's useful to use the same hash
function in all cases. Thus unify on SHA-256 because contexts where
SHA-512 is faster, are faster overall and thus less sensitive.
Change-Id: I7a782a3adba4ace3257313a24dc8bc213b9d64ec
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52165
Reviewed-by: David Benjamin <davidben@google.com>
The files no longer need to be patched because fiat-crypto now has its
own copy of our value barrier. It does, however, require syncing our
NO_ASM define with fiat's.
fiat-crypto is now licensed under any of MIT, BSD 1-clause, or Apache 2.
I've stuck with the MIT one as that's what we were previously importing.
No measurable perf difference before/after this CL, with GCC or Clang on
x86_64.
Change-Id: I2939fd517de37aabdea3ead49150135200a1b112
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52045
Reviewed-by: Adam Langley <agl@google.com>
Our FIPS module only claims support for RSA signing/verification, and
|RSA_generate_key_fips| already performs a sign/verify pair-wise
consistency test (PCT). For ECDSA, |EC_KEY_generate_fips| performs a
sign/verify PCT too. But when |EC_KEY_generate_fips| is used for key
agreement a sign/verify PCT may not be correct.
The FIPS IG[1], page 60, says:
> Though not a CAST, a pairwise consistency test (PCT) shall be
> conducted for every generated public and private key pair for the
> applicable approved algorithm (per ISO/IEC 19790:2012 Section
> 7.10.3.3). To further clarify, at minimum, the PCT that is required by
> the underlying algorithm standard (e.g. SP 800- 56Arev3 or SP
> 800-56Brev2) shall be performed.
SP 800-56Ar3, page 36, says:
> For an ECC key pair (d, Q): Use the private key, d, along with the
> generator G and other domain parameters associated with the key pair,
> to compute dG (according to the rules of elliptic-curve arithmetic).
> Compare the result to the public key, Q. If dG is not equal to Q, then
> the pair-wise consistency test fails
But |EC_KEY_generate_fips| has always done that via
|EC_KEY_check_key|. So I believe that |EC_KEY_generate_fips| works for
either case.
This change documents that.
[1] FIPS 140-3 IG dated 2022-03-14 and with SHA-256
2f232f7f5839e3263284d71c35771c9fdf2e505b02813be999377030c56b37e4
Change-Id: I4b4e2ed92ae3d59e2f2404c41694abeb3eb283f4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51988
Reviewed-by: David Benjamin <davidben@google.com>
We need a function that returns a version that links to a certificate.
Previously we have used the git hash as the version of our modules but
the source cannot contain its own hash. Thus this change defines a new
format for FIPS module versions which will be filled in once we're ready
to define a version.
Change-Id: Ie4641945119106bc47e8da94ed8a45a86abb6f92
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51986
Reviewed-by: David Benjamin <davidben@google.com>
BN_mod_sqrt implements the Tonelli–Shanks algorithm, which requires a
prime modulus. It was written such that, given a composite modulus, it
would sometimes loop forever. This change fixes the algorithm to always
terminate. However, callers must still pass a prime modulus for the
function to have a defined output.
In OpenSSL, this loop resulted in a DoS vulnerability, CVE-2022-0778.
BoringSSL is mostly unaffected by this. In particular, this case is not
reachable in BoringSSL from certificate and other ASN.1 elliptic curve
parsing code. Any impact in BoringSSL is limited to:
- Callers of EC_GROUP_new_curve_GFp that take untrusted curve parameters
- Callers of BN_mod_sqrt that take untrusted moduli
This CL updates documentation of those functions to clarify that callers
should not pass attacker-controlled values. Even with the infinite loop
fixed, doing so breaks preconditions and will give undefined output.
Change-Id: I64dc1220aaaaafedba02d2ac0e4232a3a0648160
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51925
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Martin Kreichgauer <martinkr@google.com>
Commit-Queue: Adam Langley <agl@google.com>
NIST publishes the PDFs of the security policy documents (although the
latest one is still missing). We include the docx sources to help others
who might be doing a rebrand certification of BoringCrypto.
Change-Id: I5c1511d53ec1d09d257d3aab1301486c364b660b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51505
Reviewed-by: David Benjamin <davidben@google.com>
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
x86_64-mont5.pl checks for both BMI1 and BMI2, because the MULX path
also uses the ANDN instruction. Some history here from upstream:
a5bb5bca52f57021a4017521c55a6b3590bbba7a, dated 2013-10-03, added the
MULX path to x86_64-mont5.pl. At the time, the cpuid check was
BMI2+ADX. (MULX comes from BMI2.)
37de2b5c1e370b493932552556940eb89922b027, dated 2013-10-09, made
BN_mod_exp_mont_consttime prefer the MULX mont5 code over the AVX2 rsaz
code, with a matching BMI2+ADX cpuid check.
8fc8f486f7fa098c9fbb6a6ae399e3c6856e0d87, dated 2016-01-25, tweaked some
code to use the ANDN instruction, from BMI1. Correspondingly, it changed
the cpuid check to be BMI1+BMI2+ADX. The BN_mod_exp_mont_consttime check
was left unchanged.
This CL fixes our version of the BN_mod_exp_mont_consttime check to
match the assembly, by also checking BMI1. (This should be a no-op.
Presumably any processor with BMI2 also has BMI1.)
Change-Id: Ib0cacc7e2be840d970460eef4dd9ded7fb24231c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51547
Reviewed-by: Adam Langley <agl@google.com>
While our CI machines don't have these instructions, Intel SDE covers
them. Benchmarks on an AMD EPYC machine (VM on Google Compute Engine):
Before:
Did 13619000 SHA-256 (16 bytes) operations in 3000147us (72.6 MB/sec)
Did 3728000 SHA-256 (256 bytes) operations in 3000566us (318.1 MB/sec)
Did 920000 SHA-256 (1350 bytes) operations in 3002829us (413.6 MB/sec)
Did 161000 SHA-256 (8192 bytes) operations in 3017473us (437.1 MB/sec)
Did 81000 SHA-256 (16384 bytes) operations in 3029284us (438.1 MB/sec)
After:
Did 25442000 SHA-256 (16 bytes) operations in 3000010us (135.7 MB/sec) [+86.8%]
Did 10706000 SHA-256 (256 bytes) operations in 3000171us (913.5 MB/sec) [+187.2%]
Did 3119000 SHA-256 (1350 bytes) operations in 3000470us (1403.3 MB/sec) [+239.3%]
Did 572000 SHA-256 (8192 bytes) operations in 3001226us (1561.3 MB/sec) [+257.2%]
Did 289000 SHA-256 (16384 bytes) operations in 3006936us (1574.7 MB/sec) [+259.4%]
Although we don't currently have unwind tests in CI, I ran the unwind
tests manually on the same VM. They pass, after adding in the missing
.cfi_startproc and .cfi_endproc lines.
Change-Id: I45b91819e7dcc31e63813843129afa146d0c9d47
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51546
Reviewed-by: Adam Langley <agl@google.com>
fips_break_test.h is a bad name because generate_build_files.py thinks
that it's a test file, which it is, but one that's needed in the main
build. Thanks to Svilen Kanev for noting this.
That header doesn't particularly carry its weight. The idea was that
rebuilding the break test wouldn't need to rebuild everything if that
logic was isolated in its own header. But we only have to rebuild once
now, so whatever. There's already a block of crypto/internal.h with very
similar stuff; it can go there.
Change-Id: Ifb479eafd4df9a7aac4804cae06ba87257c77fc3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51485
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Change-Id: I6dabeb0a9090a4ddcafc88a3bc53b2c28c30f14a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51465
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Change-Id: Ief328bb2a8b6264226a89233c9fba0e4621de9d7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51425
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
All FIPS testing is done with ACVP now. We can delete all the CAVP
stuff.
Change-Id: I459873474e40b0371f9cf760090a130ef9a90a8c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51330
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
FIPS validation requires showing that the continuous and start-up tests
are effective by breaking them. Traditionally BoringSSL used #defines
that tweaked the expected values. However, 140-3 now requires that the
inputs be changed, not the expected outputs.
Also, the number of tests is going to increase. Since slower platforms
already took too long to compile BoringSSL n times (once for each test
to break) we want something faster too.
Therefore all the known-answer tests (KATs) are changed such that a Go
program can find and replace the input value in order to break them.
Thus we only need to recompile once to disable the integrity test.
The runtime tests still need a #define to break, but that #define is now
put in a header file so that only the module need be recompiled, not
everything as in the previous system.
Change-Id: Ib621198e6ad02253e29af0ccd978e3c3830ad54c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51329
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Builds that compile the FIPS stuff separately don't get this header from
other files.
Change-Id: I8a1b30ae360b08d4f4b9f804cd234998889477bc
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51405
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
AS10.20 requires that the self-test for the integrity algorithm pass
before the integrity check itself. IG 10.3.A requires an HMAC self-test
now. Therefore run these tests before the integrity check.
Since we also need the ability to run all self-tests, both SHA
self-tests and the HMAC test are run again when running self-tests.
I'm assuming that they're so fast that it doesn't matter.
Change-Id: I6b23b6fd3cb6107edd7420bc8680780719bd41d2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51328
Reviewed-by: David Benjamin <davidben@google.com>
The word “calculated” is two letters longer than “expected” and it's
nice to line up the ouptuts.
Change-Id: Idac70e62d98fbe26c430f03f4643ba295e40853d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51327
Reviewed-by: David Benjamin <davidben@google.com>
The provision of FIPS that allowed the tests to be skipped based on a
flag-file has been removed in 140-3. Therefore we expect to run the fast
KATs on start-up, but to defer to slower ones until the functionality in
question is first used. So this change splits off the fast KATs and
removes support for skipping KATs based on a flag-file.
Change-Id: Ib24cb1739cfef93e4a1349d786a0257ee1083cfb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51326
Reviewed-by: David Benjamin <davidben@google.com>
This matches our other free functions.
Fixed: 473
Change-Id: Ie147995c2f5b429f78e95cfc9a08ed54181af94e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51005
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
140-3 says
> the zeroisation of protected and unprotected SSPs
> shall be performed in the following scenarios:
> ...
> For temporary value(s) generated during the integrity test of the
> module’s software or firmware upon completion of the integrity test.
(IG 9.7.B)
Change-Id: I911f294860bf33b13b2c997fc633c9bda777fc48
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50945
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>