boringssl

Commit Graph

Author	SHA1	Message	Date
David Benjamin	b7d6320be9	Replace OPENSSL_STATIC_ASSERT with static_assert. The C11 change has survived for three months now. Let's start freely using static_assert. In C files, we need to include <assert.h> because it is a macro. In C++ files, it is a keyword and we can just use it. (In MSVC C, it is actually also a keyword as in C++, but close enough.) I moved one assert from ssl3.h to ssl_lib.cc. We haven't yet required C11 in our public headers, just our internal files. Change-Id: Ic59978be43b699f2c997858179a9691606784ea5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53665 Auto-Submit: David Benjamin <davidben@google.com> Commit-Queue: Bob Beck <bbe@google.com> Reviewed-by: Bob Beck <bbe@google.com>	3 years ago
David Benjamin	0ebd69bd1e	Add BN_GENCB_get_arg. bind uses this function. Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Bob Beck <bbe@google.com> Commit-Queue: Bob Beck <bbe@google.com>	3 years ago
David Benjamin	efd09b7e37	Const-correct bn_gather5. Not that we get much type-checking from this, as an assembly function. Change-Id: I21643444cfc577e2d68f11891e602724ded52e7f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52831 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	77dc23983f	Make it more obvious that am and tmp's widths are accurate. https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a tmp.width = top line. Without it, tmp.width was set by bn_one_to_montgomery. Since we always size modular arithmetic by the modulus, tmp.width (and am.width) will actually always be top, and there's actually no need to zero pad it. We don't capture this in the type system or BIGNUM width convention, so better to set the width explicitly. The original code did it at the end, but I think doing it right when we zero pad it is better, as that's when the size gets set. But we can go a step further. The manual zero padding code came from OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are resizable, so just call bn_resize_words, immediately after the computation. (bn_resize_words will not reallocate the data because the BIGNUMs have the STATIC_DATA flag set. bn_wexpand will internally allow expanding up to dmax, or top.) Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	3 years ago
David Benjamin	b8a651439b	Align rsaz and mont5 table construction. Both implementations need to compute the first 32 powers of a. There's a commented out naive version in rsaz_exp.c that claims to be smaller, but 1% slower. (It doesn't use squares when it otherwise could.) Instead, we can write out the square-based strategy as a loop. (I wasn't able to measure a difference between any of the three versions, but this one's compact enough and does let us square more and gather5 less.) Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	3 years ago
David Benjamin	c7de4fe0bd	Simplify mont5 table computation. The unrolled loops appear to have negligible perf impact: Before: Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec) Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec) After: Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%] Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%] Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	801a801024	Add an extra reduction step to the end of RSAZ. RSAZ has a very similar bug to mont5 from https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may return the modulus when it should return zero. As in that CL, there is no security impact on our cryptographic primitives. RSAZ is described in the paper "Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures". The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery Multiplication". This is like normal Montgomery multiplication, but skips the final subtraction altogether (whereas mont5's AMM still subtracts, but replaces MM's tigher bound with just the carry bit). This would normally not be stable, but RSAZ picks a larger R > 4M, and maintains looser bounds for modular arithmetic, a < 2M. Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound. It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery form with NRMM is fully reduced. This second claim is wrong. The proof shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see Figure 7 in the paper). Thus, like mont5 with AMM, RSAZ may return the modulus when it should return zero. Fix this by adding a bn_reduce_once_in_place call at the end of the operation. Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	3 years ago
David Benjamin	13c9d5c69d	Always end BN_mod_exp_mont_consttime with normal Montgomery reduction. This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime would sometimes return m, the modulus, when it should have returned zero. Thanks to Guido Vranken for reporting it. It is only a partial fix because the same bug also exists in the "rsaz" codepath. That will be fixed in the subsequent CL. (See the commented out test.) The bug only affects zero outputs (with non-zero inputs), so we believe it has no security impact on our cryptographic functions. BoringSSL calls BN_mod_exp_mont_consttime in the following cases: - RSA private key operations - Primality testing, raising the witness to the odd part of p-1 - DSA keygen and key import, pub = g^priv (mod p) - DSA signing, r = g^k (mod p) - DH keygen, pub = g^priv (mod p) - Diffie-Hellman, secret = peer^priv (mod p) It is not possible in the RSA private key operation, provided p and q are primes. If using CRT, we are working modulo a prime, so zero output with non-zero input is impossible. If not using CRT, we work mod n. While there are nilpotent values mod n, none of them hit zero by exponentiating. (Both p and q would need to divide the input, which means n divides the input.) In primality testing, this can only be hit when the input was composite. But as the rest of the loop cannot then hit 1, we'll correctly report it as composite anyway. DSA and DH work modulo a prime, where this case cannot happen. Analysis: This bug is the result of sloppiness with the looser bounds from "almost Montgomery multiplication", described in https://eprint.iacr.org/2011/239. Prior to upstream's ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl implemented standard Montgomery reduction (the left half of figure 3 in the paper). Though it did not document this, ec9cc70f7245 changed it to implement the "almost" variant (the right half of the figure.) The difference is that, rather than subtracting if T >= m, it subtracts if T >= R. In code, it is the difference between something like our bn_reduce_once, vs. subtracting based only on T's carry bit. (Interestingly, the .Lmul_enter branch of bn_mul_mont_gather5 seems to still implement normal reduction, but the .Lmul4x_enter branch is an almost reduction.) That means none of the intermediate values here are bounded by m. They are only bounded by R. Accordingly, Figure 2 in the paper ends with step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this step. The bn_from_montgomery call only implements step 9, AMM(h, 1). (x86_64-mont5.pl's bn_from_montgomery only implements an almost reduction.) The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the paper discusses this, but is ambiguous about the scope of its 2^(n-1) < m < 2^n precondition. The m+1 bound appears to be unconditional: Montgomery reduction ultimately adds some 0 <= Y < mR to T, to get a multiple of R, and then divides by R. The output, pre-subtraction, is thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m. A single subtraction of m if T' >= m gives T'' < m. AMM works because T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper, though their formulation is more complicated to capture the word-by-word algorithm. It's ultimately the same adjustment to T. But in AMM(h, 1), T = h1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions step 10 isn't necessary because m is a prime and the inputs are non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime may be called elsewhere. Fix: To fix this, we could add the missing step 10, but a full division would not be constant-time. The analysis above says it could be a single subtraction, bn_reduce_once, but then we could integrate it into the subtraction already in plain Montgomery reduction, implemented by uppercase BN_from_montgomery. h1 = h < R <= mR, so we are within bounds. Thus, we delete lowercase bn_from_montgomery altogether, and have the mont5 path use the same BN_from_montgomery ending as the non-mont5 path. This only impacts the final step of the whole exponentiation and has no measurable perf impact. In doing so, add comments describing these looser bounds. This includes one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont (MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because MM is AMM-compatible; when passed AMM's looser inputs, it will still produce a correct looser output. Ideally we'd drop the "almost" reduction and stick to the more straightforward bounds. As this only impacts the final subtraction in each reduction, I would be surprised if it actually had a real performance impact. But this would involve deeper change to x86_64-mont5.pl, so I haven't tried this yet. I believe this is basically the same bug as https://github.com/golang/go/issues/13907 from Go. Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	3 years ago
Adam Langley	118a892d2d	Add a service indicator for FIPS 140-3. This is cribbed, with perimssion, from AWS-LC. The FIPS service indicator[1] signals when an approved service has been completed. [1] FIPS 140-3 IG 2.4.C Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com>	3 years ago
David Benjamin	227ff6e642	Remove unions in EC_SCALAR and EC_FELEM. When introducing EC_SCALAR and EC_FELEM, I used unions as convenience for converting to and from the byte representation. However, type-punning with unions is not allowed in C++ and hard to use correctly in C. As I understand the rules, they are: - The abstract machine knows what member of union was last written to. - In C, reading from an inactive member is defined to type-pun. In C++, it is UB though some compilers promise the C behavior anyway. - However, if you read or write from a pointer to a union member, the strict aliasing rule applies. (A function passed two pointers of different types otherwise needs to pessimally assume they came from the same union.) That last rule means the type-punning allowance doesn't apply if you take a pointer to an inactive member, and it's common to abstract otherwise direct accesses of members via pointers. https://github.com/openssl/openssl/issues/18225 is an example where similar union tricks have caused problems for OpenSSL. While we don't have that code, EC_SCALAR and EC_FELEM play similar tricks. We do get a second lifeline because our alternate view is a uint8_t, which we require to be unsigned char. Strict aliasing always allows the pointer type to be a character type, so pointer-indirected accesses of EC_SCALAR.bytes aren't necessarily UB. But if we ever write to EC_SCALAR.bytes directly (and we do), we'll switch the active arm and then pointers to EC_SCALAR.words become strict aliasing violations! This is all far too complicated to deal with. Ideally everyone would build with -fno-strict-aliasing because no real C code actually follows these rules. But we don't always control our downstream consumers' CFLAGS, so let's just avoid the union. This also avoids a pitfall if we ever move libcrypto to C++. For p224-64.c, I just converted the representations directly, which avoids worrying about the top 32 bits in p224_felem_to_generic. Most of the rest was words vs. bytes conversions and boils down to a cast (we're still dealing with a character type, at the end of the day). But I took the opportunity to extract some more "words"-based helper functions out of BIGNUM, so the casts would only be in one place. That too saves us from the top bits problem in the bytes-to-words direction. Bug: 301 Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505 Commit-Queue: Bob Beck <bbe@google.com> Reviewed-by: Bob Beck <bbe@google.com>	3 years ago
Adam Langley	c7a3c46574	Don't loop forever in BN_mod_sqrt on invalid inputs. BN_mod_sqrt implements the Tonelli–Shanks algorithm, which requires a prime modulus. It was written such that, given a composite modulus, it would sometimes loop forever. This change fixes the algorithm to always terminate. However, callers must still pass a prime modulus for the function to have a defined output. In OpenSSL, this loop resulted in a DoS vulnerability, CVE-2022-0778. BoringSSL is mostly unaffected by this. In particular, this case is not reachable in BoringSSL from certificate and other ASN.1 elliptic curve parsing code. Any impact in BoringSSL is limited to: - Callers of EC_GROUP_new_curve_GFp that take untrusted curve parameters - Callers of BN_mod_sqrt that take untrusted moduli This CL updates documentation of those functions to clarify that callers should not pass attacker-controlled values. Even with the infinite loop fixed, doing so breaks preconditions and will give undefined output. Change-Id: I64dc1220aaaaafedba02d2ac0e4232a3a0648160 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51925 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: Martin Kreichgauer <martinkr@google.com> Commit-Queue: Adam Langley <agl@google.com>	3 years ago
David Benjamin	4d955d20d2	Check static CPU capabilities on x86. On Arm, our CRYPTO_is_*_capable functions check the corresponding preprocessor symbol. This allows us to automatically drop dynamic checks and fallback code when some capability is always avilable. This CL does the same on x86, as well as consolidates our OPENSSL_ia32cap_P checks in one place. Since this abstraction is incompatible with some optimizations we do around OPENSSL_ia32cap_get() in the FIPS module, I've marked the symbol __attribute__((const)), which is enough to make GCC and Clang do the optimizations for us. (We already do the same to DEFINE_BSS_GET.) Most x86 platforms support a much wider range of capabilities, so this is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3 available, so this allows us to statically drop an AES implementation. (On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the bssl binary.) Configs like -march=native can also drop a bunch of code. Update-Note: This CL may break build environments that incorrectly mark some instruction as statically available. This is unlikely to happen with vector instructions like AVX, where the compiler could freely emit them anyway. However, instructions like AES-NI might be set incorrectly. Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	31ece98da1	Align rsaz_avx2_preferred with x86_64-mont5.pl. x86_64-mont5.pl checks for both BMI1 and BMI2, because the MULX path also uses the ANDN instruction. Some history here from upstream: a5bb5bca52f57021a4017521c55a6b3590bbba7a, dated 2013-10-03, added the MULX path to x86_64-mont5.pl. At the time, the cpuid check was BMI2+ADX. (MULX comes from BMI2.) 37de2b5c1e370b493932552556940eb89922b027, dated 2013-10-09, made BN_mod_exp_mont_consttime prefer the MULX mont5 code over the AVX2 rsaz code, with a matching BMI2+ADX cpuid check. 8fc8f486f7fa098c9fbb6a6ae399e3c6856e0d87, dated 2016-01-25, tweaked some code to use the ANDN instruction, from BMI1. Correspondingly, it changed the cpuid check to be BMI1+BMI2+ADX. The BN_mod_exp_mont_consttime check was left unchanged. This CL fixes our version of the BN_mod_exp_mont_consttime check to match the assembly, by also checking BMI1. (This should be a no-op. Presumably any processor with BMI2 also has BMI1.) Change-Id: Ib0cacc7e2be840d970460eef4dd9ded7fb24231c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51547 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	661266ea06	Move CPU detection symbols to crypto/internal.h. These symbols were not marked OPENSSL_EXPORT, so they weren't really usable externally anyway. They're also very sensitive to various build configuration toggles, which don't always get reflected into projects that include our headers. Move them to crypto/internal.h. Change-Id: I79a1fcf0b24e398d75a9cc6473bae28ec85cb835 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50846 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	9bcc12d540	Import a few test vectors from OpenSSL. Test vectors from `e9e726506c`. We did not have assembly file in question, but import the test vectors anyway. Change-Id: Ia18698979bc0055bae9105280296891eb7faf9b5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50785 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	cd0b767492	Add BN_GENCB_new, BN_GENCB_free, and RSA_test_flags. OpenSSL 1.1.0 made this structure opaque. I don't think we particularly need to make it opaque, but external code uses it. Also add RSA_test_flags. Change-Id: I136d38e72ec4664c78f4d1720ec691f5760090c1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50605 Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	0524538522	Fix BN_CTX usage in BN_mod_sqrt malloc error paths. Bug: 442 Change-Id: I925eb8d4c4e60dd58d8aaf6010df9783e6ba0837 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49825 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	3 years ago
David Benjamin	c65543b7a9	Make RSA_check_key more than 2x as fast. The bulk of RSA_check_key is spent in bn_div_consttime, which is a naive but constant-time long-division algorithm for the few places that divide by a secret even divisor: RSA keygen and RSA import. RSA import is somewhat performance-sensitive, so pick some low-hanging fruit: The main observation is that, in all but one call site, the bit width of the divisor is public. That means, for an N-bit divisor, we can skip the first N-1 iterations of long division because an N-1-bit remainder cannot exceed the N-bit divisor. One minor nuisance is bn_lcm_consttime, used in RSA keygen has a case that does not have a public bit width. Apply the optimization there would leak information. I've implemented this as an optional public lower bound on num_bits(divisor), which all but that call fills in. Before: Did 5060 RSA 2048 private key parse operations in 1058526us (4780.2 ops/sec) Did 1551 RSA 4096 private key parse operations in 1082343us (1433.0 ops/sec) After: Did 11532 RSA 2048 private key parse operations in 1084145us (10637.0 ops/sec) [+122.5%] Did 3542 RSA 4096 private key parse operations in 1036374us (3417.7 ops/sec) [+138.5%] Bug: b/192484677 Change-Id: I893ebb8886aeb8200a1a365673b56c49774221a2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49106 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
David Benjamin	549e4e7995	Align with upstream on 'close STDOUT' lines. When upstreaming `c1d8c5b0e0` as https://github.com/openssl/openssl/pull/10883 and then https://github.com/openssl/openssl/pull/10930, we ended up diverging slightly: in the upstream version, I ended up applying the same change to the xlate files. Upstream also suggested "error closing STDOUT: $!". Apply the same changes here. Change-Id: I8a8cbc3944432e94a8844f9f628a900edfe77b30 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48725 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
David Benjamin	61a21e7ec5	Fix sign bit in BN_div if numerator and quotient alias. See also f8fc0e35e0b1813af15887d42e17b7d5537bb86c from upstream, though our BN_divs have diverged slightly. Change-Id: I49fa4f0a5c730d34e6f41f724f1afe3685470712 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48426 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
David Benjamin	878795cac3	Remove outdated comment in primality testing. This comment dates to SSLeay. It appears to be describing the incremental trial division strategy where they would pick a starting candidate, compute moduli by small primes, and then update by incrementing the candidate and saved moduli instead of dividing from scratch. We use a simpler rejection sampling strategy. Change-Id: If2203d616f2b1f632bcd7033ceb60a83d1b75674 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48047 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
Florin Crișan	7a3e801217	fix #415 : Perl scripts fail when building from a path with spaces Because file names are not enclosed in quotation marks in the open call. https://bugs.chromium.org/p/boringssl/issues/detail?id=415 ``` cmake --build "C:\Projects\ Extern\Visual C++ 2015\x64 Debug\Build\BoringSSL\." [9/439] Generating rdrand-x86_64.asm FAILED: crypto/fipsmodule/rdrand-x86_64.asm cmd.exe /C "cd /D "C:\Projects\ Extern\Visual C++ 2015\x64 Debug\Build\BoringSSL\crypto\fipsmodule" && "C:\Program Files\CMake\bin\cmake.exe" -E make_directory . && C:\Perl64\bin\perl.exe "C:/Projects/ Extern/Source/BoringSSL/crypto/fipsmodule/rand/asm/rdrand-x86_64.pl" nasm rdrand-x86_64.asm" Can't open perl script "C:/Projects/": No such file or directory error closing STDOUT at C:/Projects/ Extern/Source/BoringSSL/crypto/fipsmodule/rand/asm/rdrand-x86_64.pl line 87. ninja: build stopped: subcommand failed. ``` Bug: 415 Change-Id: I83c4a460689b9adeb439425ad390322ae8b2002a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/47884 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	4 years ago
David Benjamin	139adff9b2	Fix mismatch between header and implementation of bn_sqr_comba8. Bug: 402 Change-Id: I6de879f44f6e3eca26f2f49c500769d944fa9bc0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46404 Reviewed-by: Adam Langley <agl@google.com>	4 years ago
Adam Langley	5cf02188fe	Add FFDH FIPS self-test. This invovles a \|2048\|^\|225\| modexp, which is far from ideal, but is now required in FIPS mode. Change-Id: Id7384b4ba92aa74e971231bc44fa0f10434d18e2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45085 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	2d691ca60d	Make BN_clear_free a wrapper around BN_free. We clear all heap memory on free now, thus the difference between these functions is quite small. There are some differences though: Firstly, BN_clear_free will attempt to zero out static limb data. But static data is probably read-only and thus trying to zero it will crash. Secondly it will try to zero out the BIGNUM structure itself. But either it's on the heap, and will be zeroed anyway, or else it's on the stack, and we don't try and clear the stack in general because the compiler is duplicating bits of it at will anyway. Change-Id: I8a07385a102cfd308b555432942225c25eb7c12d Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45084 Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Anthony Roberts	afd5dba756	Add ASM optimizations for Windows on Arm Windows on Arm (WoA) builds are currently using the C implementations of the various functions within BoringSSL. This patch enables feature detection for the Neon and hardware crypto optimizations, and updates the perl script to generate AArch64 .S files for WoA. Note these files use GNU assembler syntax (specifically tested with Clang assembler), not armasm. Change-Id: Id8841f4db0498ec16215095a4e6bd60d427cd54b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/43304 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Adam Langley	cd204d8e15	Include bn.h from bn/internal.h If using precompiled headers then this is needed otherwise bn/internal.h doesn't have a definition for BN_ULONG etc. Change-Id: I41b331465abae7108f255722a156d2ffb3016ba3 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44604 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	4 years ago
Tamas Petz	a0b49d63fd	aarch64: support BTI and pointer authentication in assembly This change adds optional support for - Armv8.3-A Pointer Authentication (PAuth) and - Armv8.5-A Branch Target Identification (BTI) features to the perl scripts. Both features can be enabled with additional compiler flags. Unless any of these are enabled explicitly there is no code change at all. The extensions are briefly described below. Please read the appropriate chapters of the Arm Architecture Reference Manual for the complete specification. Scope ----- This change only affects generated assembly code. Armv8.3-A Pointer Authentication -------------------------------- Pointer Authentication extension supports the authentication of the contents of registers before they are used for indirect branching or load. PAuth provides a probabilistic method to detect corruption of register values. PAuth signing instructions generate a Pointer Authentication Code (PAC) based on the value of a register, a seed and a key. The generated PAC is inserted into the original value in the register. A PAuth authentication instruction recomputes the PAC, and if it matches the PAC in the register, restores its original value. In case of a mismatch, an architecturally unmapped address is generated instead. With PAuth, mitigation against ROP (Return-oriented Programming) attacks can be implemented. This is achieved by signing the contents of the link-register (LR) before it is pushed to stack. Once LR is popped, it is authenticated. This way a stack corruption which overwrites the LR on the stack is detectable. The PAuth extension adds several new instructions, some of which are not recognized by older hardware. To support a single codebase for both pre Armv8.3-A targets and newer ones, only NOP-space instructions are added by this patch. These instructions are treated as NOPs on hardware which does not support Armv8.3-A. Furthermore, this patch only considers cases where LR is saved to the stack and then restored before branching to its content. There are cases in the code where LR is pushed to stack but it is not used later. We do not address these cases as they are not affected by PAuth. There are two keys available to sign an instruction address: A and B. PACIASP and PACIBSP only differ in the used keys: A and B, respectively. The keys are typically managed by the operating system. To enable generating code for PAuth compile with -mbranch-protection=<mode>: - standard or pac-ret: add PACIASP and AUTIASP, also enables BTI (read below) - pac-ret+b-key: add PACIBSP and AUTIBSP Armv8.5-A Branch Target Identification -------------------------------------- Branch Target Identification features some new instructions which protect the execution of instructions on guarded pages which are not intended branch targets. If Armv8.5-A is supported by the hardware, execution of an instruction changes the value of PSTATE.BTYPE field. If an indirect branch lands on a guarded page the target instruction must be one of the BTI <jc> flavors, or in case of a direct call or jump it can be any other instruction. If the target instruction is not compatible with the value of PSTATE.BTYPE a Branch Target Exception is generated. In short, indirect jumps are compatible with BTI <j> and <jc> while indirect calls are compatible with BTI <c> and <jc>. Please refer to the specification for the details. Armv8.3-A PACIASP and PACIBSP are implicit branch target identification instructions which are equivalent with BTI c or BTI jc depending on system register configuration. BTI is used to mitigate JOP (Jump-oriented Programming) attacks by limiting the set of instructions which can be jumped to. BTI requires active linker support to mark the pages with BTI-enabled code as guarded. For ELF64 files BTI compatibility is recorded in the .note.gnu.property section. For a shared object or static binary it is required that all linked units support BTI. This means that even a single assembly file without the required note section turns-off BTI for the whole binary or shared object. The new BTI instructions are treated as NOPs on hardware which does not support Armv8.5-A or on pages which are not guarded. To insert this new and optional instruction compile with -mbranch-protection=standard (also enables PAuth) or +bti. When targeting a guarded page from a non-guarded page, weaker compatibility restrictions apply to maintain compatibility between legacy and new code. For detailed rules please refer to the Arm ARM. Compiler support ---------------- Compiler support requires understanding '-mbranch-protection=<mode>' and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT and __ARM_FEATURE_PAC_DEFAULT). The current state is the following: ------------------------------------------------------- \| Compiler \| -mbranch-protection \| Feature macros \| +----------+---------------------+--------------------+ \| clang \| 9.0.0 \| 11.0.0 \| +----------+---------------------+--------------------+ \| gcc \| 9 \| expected in 10.1+ \| ------------------------------------------------------- Available Platforms ------------------ Arm Fast Model and QEMU support both extensions. https://developer.arm.com/tools-and-software/simulation-models/fast-models https://www.qemu.org/ Implementation Notes -------------------- This change adds BTI landing pads even to assembly functions which are likely to be directly called only. In these cases, landing pads might be superfluous depending on what code the linker generates. Code size and performance impact for these cases would be negligble. Interaction with C code ----------------------- Pointer Authentication is a per-frame protection while Branch Target Identification can be turned on and off only for all code pages of a whole shared object or static binary. Because of these properties if C/C++ code is compiled without any of the above features but assembly files support any of them unconditionally there is no incompatibility between the two. Useful Links ------------ To fully understand the details of both PAuth and BTI it is advised to read the related chapters of the Arm Architecture Reference Manual (Arm ARM): https://developer.arm.com/documentation/ddi0487/latest/ Additional materials: "Providing protection for complex software" https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software Arm Compiler Reference Guide Version 6.14: -mbranch-protection https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en Arm C Language Extensions (ACLE) https://developer.arm.com/docs/101028/latest Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42084 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
Adam Langley	fb0c05cac2	acvp: add CMAC-AES support. Change by Dan Janni. Change-Id: I3f059e7b1a822c6f97128ca92a693499a3f7fa8f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/41984 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago

29 Commits (1106836aa99c08d0b709219889d364a4c855d3c9)