Chiebot-Mirror/boringssl - boringssl - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
David Benjamin	efd09b7e37	Const-correct bn_gather5. Not that we get much type-checking from this, as an assembly function. Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	13c9d5c69d	Always end BN_mod_exp_mont_consttime with normal Montgomery reduction. This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime would sometimes return m, the modulus, when it should have returned zero. Thanks to Guido Vranken for reporting it. It is only a partial fix because the same bug also exists in the "rsaz" codepath. That will be fixed in the subsequent CL. (See the commented out test.) The bug only affects zero outputs (with non-zero inputs), so we believe it has no security impact on our cryptographic functions. BoringSSL calls BN_mod_exp_mont_consttime in the following cases: - RSA private key operations - Primality testing, raising the witness to the odd part of p-1 - DSA keygen and key import, pub = g^priv (mod p) - DSA signing, r = g^k (mod p) - DH keygen, pub = g^priv (mod p) - Diffie-Hellman, secret = peer^priv (mod p) It is not possible in the RSA private key operation, provided p and q are primes. If using CRT, we are working modulo a prime, so zero output with non-zero input is impossible. If not using CRT, we work mod n. While there are nilpotent values mod n, none of them hit zero by exponentiating. (Both p and q would need to divide the input, which means n divides the input.) In primality testing, this can only be hit when the input was composite. But as the rest of the loop cannot then hit 1, we'll correctly report it as composite anyway. DSA and DH work modulo a prime, where this case cannot happen. Analysis: This bug is the result of sloppiness with the looser bounds from "almost Montgomery multiplication", described in https://eprint.iacr.org/2011/239. Prior to upstream's ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl implemented standard Montgomery reduction (the left half of figure 3 in the paper). Though it did not document this, ec9cc70f7245 changed it to implement the "almost" variant (the right half of the figure.) The difference is that, rather than subtracting if T >= m, it subtracts if T >= R. In code, it is the difference between something like our bn_reduce_once, vs. subtracting based only on T's carry bit. (Interestingly, the .Lmul_enter branch of bn_mul_mont_gather5 seems to still implement normal reduction, but the .Lmul4x_enter branch is an almost reduction.) That means none of the intermediate values here are bounded by m. They are only bounded by R. Accordingly, Figure 2 in the paper ends with step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this step. The bn_from_montgomery call only implements step 9, AMM(h, 1). (x86_64-mont5.pl's bn_from_montgomery only implements an almost reduction.) The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the paper discusses this, but is ambiguous about the scope of its 2^(n-1) < m < 2^n precondition. The m+1 bound appears to be unconditional: Montgomery reduction ultimately adds some 0 <= Y < mR to T, to get a multiple of R, and then divides by R. The output, pre-subtraction, is thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m. A single subtraction of m if T' >= m gives T'' < m. AMM works because T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper, though their formulation is more complicated to capture the word-by-word algorithm. It's ultimately the same adjustment to T. But in AMM(h, 1), T = h1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions step 10 isn't necessary because m is a prime and the inputs are non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime may be called elsewhere. Fix: To fix this, we could add the missing step 10, but a full division would not be constant-time. The analysis above says it could be a single subtraction, bn_reduce_once, but then we could integrate it into the subtraction already in plain Montgomery reduction, implemented by uppercase BN_from_montgomery. h1 = h < R <= mR, so we are within bounds. Thus, we delete lowercase bn_from_montgomery altogether, and have the mont5 path use the same BN_from_montgomery ending as the non-mont5 path. This only impacts the final step of the whole exponentiation and has no measurable perf impact. In doing so, add comments describing these looser bounds. This includes one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont (MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because MM is AMM-compatible; when passed AMM's looser inputs, it will still produce a correct looser output. Ideally we'd drop the "almost" reduction and stick to the more straightforward bounds. As this only impacts the final subtraction in each reduction, I would be surprised if it actually had a real performance impact. But this would involve deeper change to x86_64-mont5.pl, so I haven't tried this yet. I believe this is basically the same bug as https://github.com/golang/go/issues/13907 from Go. Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	227ff6e642	Remove unions in EC_SCALAR and EC_FELEM. When introducing EC_SCALAR and EC_FELEM, I used unions as convenience for converting to and from the byte representation. However, type-punning with unions is not allowed in C++ and hard to use correctly in C. As I understand the rules, they are: - The abstract machine knows what member of union was last written to. - In C, reading from an inactive member is defined to type-pun. In C++, it is UB though some compilers promise the C behavior anyway. - However, if you read or write from a pointer to a union member, the strict aliasing rule applies. (A function passed two pointers of different types otherwise needs to pessimally assume they came from the same union.) That last rule means the type-punning allowance doesn't apply if you take a pointer to an inactive member, and it's common to abstract otherwise direct accesses of members via pointers. https://github.com/openssl/openssl/issues/18225 is an example where similar union tricks have caused problems for OpenSSL. While we don't have that code, EC_SCALAR and EC_FELEM play similar tricks. We do get a second lifeline because our alternate view is a uint8_t, which we require to be unsigned char. Strict aliasing always allows the pointer type to be a character type, so pointer-indirected accesses of EC_SCALAR.bytes aren't necessarily UB. But if we ever write to EC_SCALAR.bytes directly (and we do), we'll switch the active arm and then pointers to EC_SCALAR.words become strict aliasing violations! This is all far too complicated to deal with. Ideally everyone would build with -fno-strict-aliasing because no real C code actually follows these rules. But we don't always control our downstream consumers' CFLAGS, so let's just avoid the union. This also avoids a pitfall if we ever move libcrypto to C++. For p224-64.c, I just converted the representations directly, which avoids worrying about the top 32 bits in p224_felem_to_generic. Most of the rest was words vs. bytes conversions and boils down to a cast (we're still dealing with a character type, at the end of the day). But I took the opportunity to extract some more "words"-based helper functions out of BIGNUM, so the casts would only be in one place. That too saves us from the top bits problem in the bytes-to-words direction. Bug: 301 Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505 Commit-Queue: Bob Beck <bbe@google.com> Reviewed-by: Bob Beck <bbe@google.com>	3 years ago
David Benjamin	c65543b7a9	Make RSA_check_key more than 2x as fast. The bulk of RSA_check_key is spent in bn_div_consttime, which is a naive but constant-time long-division algorithm for the few places that divide by a secret even divisor: RSA keygen and RSA import. RSA import is somewhat performance-sensitive, so pick some low-hanging fruit: The main observation is that, in all but one call site, the bit width of the divisor is public. That means, for an N-bit divisor, we can skip the first N-1 iterations of long division because an N-1-bit remainder cannot exceed the N-bit divisor. One minor nuisance is bn_lcm_consttime, used in RSA keygen has a case that does not have a public bit width. Apply the optimization there would leak information. I've implemented this as an optional public lower bound on num_bits(divisor), which all but that call fills in. Before: Did 5060 RSA 2048 private key parse operations in 1058526us (4780.2 ops/sec) Did 1551 RSA 4096 private key parse operations in 1082343us (1433.0 ops/sec) After: Did 11532 RSA 2048 private key parse operations in 1084145us (10637.0 ops/sec) [+122.5%] Did 3542 RSA 4096 private key parse operations in 1036374us (3417.7 ops/sec) [+138.5%] Bug: b/192484677 Change-Id: I893ebb8886aeb8200a1a365673b56c49774221a2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49106 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
David Benjamin	139adff9b2	Fix mismatch between header and implementation of bn_sqr_comba8. Bug: 402 Change-Id: I6de879f44f6e3eca26f2f49c500769d944fa9bc0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46404 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
Adam Langley	5cf02188fe	Add FFDH FIPS self-test. This invovles a \|2048\|^\|225\| modexp, which is far from ideal, but is now required in FIPS mode. Change-Id: Id7384b4ba92aa74e971231bc44fa0f10434d18e2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45085 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	cd204d8e15	Include bn.h from bn/internal.h If using precompiled headers then this is needed otherwise bn/internal.h doesn't have a definition for BN_ULONG etc. Change-Id: I41b331465abae7108f255722a156d2ffb3016ba3 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44604 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	fb0c05cac2	acvp: add CMAC-AES support. Change by Dan Janni. Change-Id: I3f059e7b1a822c6f97128ca92a693499a3f7fa8f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/41984 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago