This is far from all of it, but finishes a good chunk of bcm.c.
Bug: 516
Change-Id: If764e5af1c6b62e8342554502ecc4d563e44bc50
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54207
Reviewed-by: Bob Beck <bbe@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Poison the EVP_CIPHER_CTX structure on failures, and indicate
that it is an error to re-use an EVP_CIPHER_CTX context in another
call after a failure.
Bug: 494
Change-Id: Ibcdf28b83a2e690f7aab789d908c076d844231c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54185
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
OpenSSL added a separate "secure heap" to allocate some data in a
different heap. We don't implement this, so just act as if initializing
it always fails. Node now expects these functions to be available.
Change-Id: I4c57c807c51681b16ec3a60e9674583b193358c4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54309
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
GCC 12's -Wstringop-overflow flags issues in SHA224_Final, etc., because
it calls into generic code that might output a SHA-224 length or a
SHA-256 length, and the function prototype declares the array is only
sized for SHA-224.
This is a bit messy because OpenSSL's API for the truncated SHA-2 hashes
allows you to mix and match them. The output size is set by SHA224_Init
and then, originally, SHA256_Final and SHA224_Final were the same thing.
See how OpenSSL's own SHA224 function calls SHA224_Init + SHA256_Final:
https://github.com/openssl/openssl/blob/OpenSSL_1_1_1q/crypto/sha/sha256.c#L49-L61
To get the function prototype bounds to work out, we tightened this
slightly in
https://boringssl-review.googlesource.com/c/boringssl/+/47807 and added
an assert to SHA224_Final that ctx->md_len was the right size.
SHA256_Final does not have that assert yet. The assert says that mixing
SHA256_Init and SHA224_Final is a caller error.
This isn't good enough for GCC 12, which checks bounds assuming there is
no external invariant on ctx->md_len. This CL changes the behavior of
the shorter Final functions: they will now always output the length
implied by the function name. ctx->md_len only figures into an assert()
call. As we don't have the assert in the untruncated functions yet, I've
preserved their behavior, but the test run with cl/471617180 should tell
us whether apply this to all functions is feasible.
Update-Note: Truncated SHA-2 Final functions change behavior slightly,
but anyone affected by this behavior change would already have tripped
an assert() in debug builds.
Change-Id: I80fdcbe6ad76bc8713c0f2de329b958a2b35e8ae
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54246
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
It is now 2022. See if we can assume getrandom in this configuration.
Update-Note: The /dev/urandom fallback is no longer available in FIPS
builds. This fallback relied on RNGGETENTCNT and was quite flaky.
Change-Id: Icf6d29f6d5952fb6c5656c9039a4cfaf1de2d724
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54127
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
For various reasons, our FIPS mode build will sometimes seed from RDRAND
instead of the OS. (And, when
https://boringssl-review.googlesource.com/c/boringssl/+/52527 relands,
there'll be another non-OS source.)
To help with this,
https://boringssl-review.googlesource.com/c/boringssl/+/37664 made the
FIPS mode rand_get_seed opportunistically incorporate OS entropy when
available. Originally, it just XORed into the original entropy.
Then https://boringssl-review.googlesource.com/c/boringssl/+/44305
rearranged this so that rand_get_seed had an out_used_cpu (since renamed
to out_want_additional input) output, with the caller mixing the entropy
in instead, into the personalization input to CTR_DRBG_init.
In doing so, that change lost the OS entropy in the CTR_DRBG_reseed
calls. Add it back in, using the additional_data parameter. As part of
this, move the CRYPTO_sysrand_if_available call back to rand_get_seed,
this time as a second output which the caller is responsible for passing
into CTR_DRBG_{init,reseed} alongside the main output.
Change-Id: Ie3335c74e940c760031a28de932d6fedfe355ea0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54126
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
This adds a boringssl interface to get up to 256 bytes of system
entropy from system entropy sources without going through
RAND_bytes. It should only be used for seeding custom prng's
or where malloc() should not be called from boringssl.
Just as with RAND_bytes(), this can abort the program on failure.
Bug: chromium:1295105
Change-Id: Ia55509702970608fe09cfee9809d02f107c15c8c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54045
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Splitting this out from most of the -Wshorten-64-to-32 fixes since it
non-trivially rewrites the function. While I'm here, move variable
declarations slightly closer to their use and document how the salt
check differs from the spec.
Bug: 516
Change-Id: I2e53afecb8ba720fd8c02da504b56c829c20c93b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54206
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
On aarch64 and x86_64 ABIs, the unused bits of 32-bit parameters have
unspecified value. That means if, say, the aarch64
aes_hw_set_encrypt_key accessed the 'bits' parameter as X1 rather than
W1, it could get a different value from what C passed in. To test this,
our ABI testing framework fills the upper half of the register with
garbage. However, set_encrypt_key just cleanly returns error on
unrecognized bit length. So, to check that this all worked correctly, we
need to assert that the return value was correct.
Looking at the assembly, they all handle it correctly, but now we'll
also test it.
(Note these functions break the usual convention and use zero as the
success value.)
Change-Id: Icaf65ea54564ebfe3696b42287488fe3f72ef138
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54205
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
O_CLOEXEC avoids a race condition and is less code. It was supported in
Linux starting 2.6.23. https://bugs.python.org/issue26343#msg260151 says
it's been available since macOS 10.7. Let's try using it instead of
fcntl and see if anything breaks. It's even part of POSIX these days.
Update-Note: BoringSSL's /dev/urandom code now assumes the platform
supports O_CLOEXEC.
Change-Id: I95313892b36539591685d4c83a387f77129ad3d1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54125
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CPython uses this function.
Change-Id: I03ead7f54ad19e2a0b2ea3b142298cc1e55c3c90
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53967
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
More OpenSSL compatibility functions.
Change-Id: I8e9429fcbc3e285f4c4ad9bdf4c1d9d3c73c3064
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53925
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
This syscall is required by generatekey in keystore.
Signed-off-by: Liu Cunyuan <liucunyuan.lcy@linux.alibaba.com>
Signed-off-by: Mao Han <han_mao@linux.alibaba.com>
Change-Id: I4dd0534daa6cfa52429e5bf398679fccb7d67e7f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53765
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
The C11 change has survived for three months now. Let's start freely
using static_assert. In C files, we need to include <assert.h> because
it is a macro. In C++ files, it is a keyword and we can just use it. (In
MSVC C, it is actually also a keyword as in C++, but close enough.)
I moved one assert from ssl3.h to ssl_lib.cc. We haven't yet required
C11 in our public headers, just our internal files.
Change-Id: Ic59978be43b699f2c997858179a9691606784ea5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53665
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
This reverts commit 4259ae8198.
Some Android builders perhaps lack getrandom support.
Change-Id: Ic7537c07dacb31a54adb453ddd5f82a789089eaf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53625
Auto-Submit: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
When seeding a DRBG for the first time we currently make two reads: one
to start the CRNGT and a second to read the actual seed. These reads can
be merged to save I/O.
Change-Id: I2a83edf7f3c8b9d6cebcde02195845be9fde19b2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52526
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This flag is currently set if DRBG entropy is obtained from RDRAND. It
indicates that we should add kernel entropy when seeding the DRBG. But
this might be true for methods other than RDRAND in the future so this
change renames it accordingly.
Change-Id: I91826178a806e3c6dadebbb844358a7a12e0b09b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52525
Reviewed-by: David Benjamin <davidben@google.com>
p256-armv8-asm.pl defined ecp_nistz256_[to|from]_mont as global
functions, but p256-nistz.h defined them as static inlines.
Additionally, ecp_nistz256_to_mont was never used.
This change drops the assembly versions and drops ecp_nistz256_to_mont
completely.
Change-Id: Ie2cc5bf4adc423f72f61cf227be0e93c9a6e2031
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53606
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
27ffcc6e19 switched the integrity check to using SHA-256, but the
Aarch64 FIPS build was still passing -sha256 to inject_hash.go.
Change-Id: I641de17d62205c7f127cd2a910d4e98778d492e7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53605
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
I'm not sure what the history of this is, but it seems to work
just fine in MSVC now.
Change-Id: Iebdc365486bb30a61a1001f705aef7dcaa2a9fcd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52985
Reviewed-by: Adam Langley <agl@google.com>
bind uses this function.
Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
I don't think these are all UB by C's rules, but it's easier not to
think about the pointers. Still more to go, but these were some easy
ones.
Bug: 301
Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a
tmp.width = top line. Without it, tmp.width was set by
bn_one_to_montgomery. Since we always size modular arithmetic by the
modulus, tmp.width (and am.width) will actually always be top, and
there's actually no need to zero pad it.
We don't capture this in the type system or BIGNUM width convention, so
better to set the width explicitly. The original code did it at the end,
but I think doing it right when we zero pad it is better, as that's when
the size gets set.
But we can go a step further. The manual zero padding code came from
OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are
resizable, so just call bn_resize_words, immediately after the
computation.
(bn_resize_words will not reallocate the data because the BIGNUMs have
the STATIC_DATA flag set. bn_wexpand will internally allow expanding up
to dmax, or top.)
Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(It's a pain because we don't have it setup in CMake, but perhaps we
should have a builder for the configuration that doesn't have bcm.c.)
Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Both implementations need to compute the first 32 powers of a. There's a
commented out naive version in rsaz_exp.c that claims to be smaller, but
1% slower. (It doesn't use squares when it otherwise could.)
Instead, we can write out the square-based strategy as a loop. (I wasn't
able to measure a difference between any of the three versions, but this
one's compact enough and does let us square more and gather5 less.)
Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The unrolled loops appear to have negligible perf impact:
Before:
Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec)
Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec)
After:
Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%]
Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%]
Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
RSAZ has a very similar bug to mont5 from
https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may
return the modulus when it should return zero. As in that CL, there is
no security impact on our cryptographic primitives.
RSAZ is described in the paper "Software Implementation of Modular
Exponentiation, Using Advanced Vector Instructions Architectures".
The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery
Multiplication". This is like normal Montgomery multiplication, but
skips the final subtraction altogether (whereas mont5's AMM still
subtracts, but replaces MM's tigher bound with just the carry bit). This
would normally not be stable, but RSAZ picks a larger R > 4M, and
maintains looser bounds for modular arithmetic, a < 2M.
Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound.
It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery
form with NRMM is fully reduced. This second claim is wrong. The proof
shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not
NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see
Figure 7 in the paper).
Thus, like mont5 with AMM, RSAZ may return the modulus when it should
return zero. Fix this by adding a bn_reduce_once_in_place call at the
end of the operation.
Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath. That will be
fixed in the subsequent CL. (See the commented out test.)
The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions. BoringSSL
calls BN_mod_exp_mont_consttime in the following cases:
- RSA private key operations
- Primality testing, raising the witness to the odd part of p-1
- DSA keygen and key import, pub = g^priv (mod p)
- DSA signing, r = g^k (mod p)
- DH keygen, pub = g^priv (mod p)
- Diffie-Hellman, secret = peer^priv (mod p)
It is not possible in the RSA private key operation, provided p and q
are primes. If using CRT, we are working modulo a prime, so zero output
with non-zero input is impossible. If not using CRT, we work mod n.
While there are nilpotent values mod n, none of them hit zero by
exponentiating. (Both p and q would need to divide the input, which
means n divides the input.)
In primality testing, this can only be hit when the input was composite.
But as the rest of the loop cannot then hit 1, we'll correctly report it
as composite anyway.
DSA and DH work modulo a prime, where this case cannot happen.
Analysis:
This bug is the result of sloppiness with the looser bounds from "almost
Montgomery multiplication", described in
https://eprint.iacr.org/2011/239. Prior to upstream's
ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl
implemented standard Montgomery reduction (the left half of figure 3 in
the paper).
Though it did not document this, ec9cc70f7245 changed it to implement
the "almost" variant (the right half of the figure.) The difference is
that, rather than subtracting if T >= m, it subtracts if T >= R. In
code, it is the difference between something like our bn_reduce_once,
vs. subtracting based only on T's carry bit. (Interestingly, the
.Lmul_enter branch of bn_mul_mont_gather5 seems to still implement
normal reduction, but the .Lmul4x_enter branch is an almost reduction.)
That means none of the intermediate values here are bounded by m. They
are only bounded by R. Accordingly, Figure 2 in the paper ends with
step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this
step. The bn_from_montgomery call only implements step 9, AMM(h, 1).
(x86_64-mont5.pl's bn_from_montgomery only implements an almost
reduction.)
The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the
paper discusses this, but is ambiguous about the scope of its 2^(n-1) <
m < 2^n precondition. The m+1 bound appears to be unconditional:
Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a
multiple of R, and then divides by R. The output, pre-subtraction, is
thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m.
A single subtraction of m if T' >= m gives T'' < m. AMM works because
T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R
gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper,
though their formulation is more complicated to capture the word-by-word
algorithm. It's ultimately the same adjustment to T.
But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That
is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced
is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions
step 10 isn't necessary because m is a prime and the inputs are
non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime
may be called elsewhere.
Fix:
To fix this, we could add the missing step 10, but a full division would
not be constant-time. The analysis above says it could be a single
subtraction, bn_reduce_once, but then we could integrate it into
the subtraction already in plain Montgomery reduction, implemented by
uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within
bounds.
Thus, we delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.
In doing so, add comments describing these looser bounds. This includes
one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont
(MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because
MM is AMM-compatible; when passed AMM's looser inputs, it will still
produce a correct looser output.
Ideally we'd drop the "almost" reduction and stick to the more
straightforward bounds. As this only impacts the final subtraction in
each reduction, I would be surprised if it actually had a real
performance impact. But this would involve deeper change to
x86_64-mont5.pl, so I haven't tried this yet.
I believe this is basically the same bug as
https://github.com/golang/go/issues/13907 from Go.
Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
For now, it contains a call to set the service indicator so it should
live in the module. In term it would be nice to move it back out and
have the service indicator set in RSA and ECDSA functions themselves
once the ECDSA functions can take an indicator of the hash function
used.
Change-Id: I2a3c262f66b1881a96ae3e49784a0dc9fc8c4589
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52705
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
If we're to have any hope of fixing EVP_CIPHER_CTX's calling convention, we
need to be able to change the shape of its method table.
Looking back, it looks like we exported this in
https://boringssl-review.googlesource.com/4330, for OpenSSH. I don't
remember exactly what OpenSSH was doing, but I see in this commit, they
removed a bunch of custom EVP_CIPHERs which would definitely have
required an exported EVP_CIPHER struct:
cdccebdf85
That's been gone for a while now, so hopefully we can hide it again. (If
a project needs a cipher not implemented by OpenSSL, it's not strictly
necessarily to make a custom EVP_CIPHER. It might be convenient to reuse
the abstraction, but you can always just call your own APIs directly.)
Update-Note: EVP_CIPHER is now opaque. Use accessors instead.
Bug: 494
Change-Id: I9344690c3cfe7d19d6ca12fb66484ced57dbe869
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52725
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
This is cribbed, with perimssion, from AWS-LC. The FIPS service
indicator[1] signals when an approved service has been completed.
[1] FIPS 140-3 IG 2.4.C
Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
strcasecmp is locale-sensitive, which can cause some mishaps.
This CL should be a no-op, because this call is only used on Android,
and bionic's strcasecmp seems to be ASCII-only. But using
OPENSSL_strcasecmp everywhere is easier to reason about.
Change-Id: Iecf9bc4da1bb3a4ab87b1e8b1d7f6f6c6e44aceb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52305
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at 19e277dd19/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
There are paperwork reasons why it's useful to use the same hash
function in all cases. Thus unify on SHA-256 because contexts where
SHA-512 is faster, are faster overall and thus less sensitive.
Change-Id: I7a782a3adba4ace3257313a24dc8bc213b9d64ec
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52165
Reviewed-by: David Benjamin <davidben@google.com>
The files no longer need to be patched because fiat-crypto now has its
own copy of our value barrier. It does, however, require syncing our
NO_ASM define with fiat's.
fiat-crypto is now licensed under any of MIT, BSD 1-clause, or Apache 2.
I've stuck with the MIT one as that's what we were previously importing.
No measurable perf difference before/after this CL, with GCC or Clang on
x86_64.
Change-Id: I2939fd517de37aabdea3ead49150135200a1b112
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52045
Reviewed-by: Adam Langley <agl@google.com>
Our FIPS module only claims support for RSA signing/verification, and
|RSA_generate_key_fips| already performs a sign/verify pair-wise
consistency test (PCT). For ECDSA, |EC_KEY_generate_fips| performs a
sign/verify PCT too. But when |EC_KEY_generate_fips| is used for key
agreement a sign/verify PCT may not be correct.
The FIPS IG[1], page 60, says:
> Though not a CAST, a pairwise consistency test (PCT) shall be
> conducted for every generated public and private key pair for the
> applicable approved algorithm (per ISO/IEC 19790:2012 Section
> 7.10.3.3). To further clarify, at minimum, the PCT that is required by
> the underlying algorithm standard (e.g. SP 800- 56Arev3 or SP
> 800-56Brev2) shall be performed.
SP 800-56Ar3, page 36, says:
> For an ECC key pair (d, Q): Use the private key, d, along with the
> generator G and other domain parameters associated with the key pair,
> to compute dG (according to the rules of elliptic-curve arithmetic).
> Compare the result to the public key, Q. If dG is not equal to Q, then
> the pair-wise consistency test fails
But |EC_KEY_generate_fips| has always done that via
|EC_KEY_check_key|. So I believe that |EC_KEY_generate_fips| works for
either case.
This change documents that.
[1] FIPS 140-3 IG dated 2022-03-14 and with SHA-256
2f232f7f5839e3263284d71c35771c9fdf2e505b02813be999377030c56b37e4
Change-Id: I4b4e2ed92ae3d59e2f2404c41694abeb3eb283f4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51988
Reviewed-by: David Benjamin <davidben@google.com>