The C11 change has survived for three months now. Let's start freely
using static_assert. In C files, we need to include <assert.h> because
it is a macro. In C++ files, it is a keyword and we can just use it. (In
MSVC C, it is actually also a keyword as in C++, but close enough.)
I moved one assert from ssl3.h to ssl_lib.cc. We haven't yet required
C11 in our public headers, just our internal files.
Change-Id: Ic59978be43b699f2c997858179a9691606784ea5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53665
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
bind uses this function.
Change-Id: I97ba86d9f75597bff125ae0b56952effc397e6b8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53010
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a
tmp.width = top line. Without it, tmp.width was set by
bn_one_to_montgomery. Since we always size modular arithmetic by the
modulus, tmp.width (and am.width) will actually always be top, and
there's actually no need to zero pad it.
We don't capture this in the type system or BIGNUM width convention, so
better to set the width explicitly. The original code did it at the end,
but I think doing it right when we zero pad it is better, as that's when
the size gets set.
But we can go a step further. The manual zero padding code came from
OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are
resizable, so just call bn_resize_words, immediately after the
computation.
(bn_resize_words will not reallocate the data because the BIGNUMs have
the STATIC_DATA flag set. bn_wexpand will internally allow expanding up
to dmax, or top.)
Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Both implementations need to compute the first 32 powers of a. There's a
commented out naive version in rsaz_exp.c that claims to be smaller, but
1% slower. (It doesn't use squares when it otherwise could.)
Instead, we can write out the square-based strategy as a loop. (I wasn't
able to measure a difference between any of the three versions, but this
one's compact enough and does let us square more and gather5 less.)
Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The unrolled loops appear to have negligible perf impact:
Before:
Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec)
Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec)
After:
Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%]
Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%]
Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
RSAZ has a very similar bug to mont5 from
https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may
return the modulus when it should return zero. As in that CL, there is
no security impact on our cryptographic primitives.
RSAZ is described in the paper "Software Implementation of Modular
Exponentiation, Using Advanced Vector Instructions Architectures".
The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery
Multiplication". This is like normal Montgomery multiplication, but
skips the final subtraction altogether (whereas mont5's AMM still
subtracts, but replaces MM's tigher bound with just the carry bit). This
would normally not be stable, but RSAZ picks a larger R > 4M, and
maintains looser bounds for modular arithmetic, a < 2M.
Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound.
It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery
form with NRMM is fully reduced. This second claim is wrong. The proof
shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not
NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see
Figure 7 in the paper).
Thus, like mont5 with AMM, RSAZ may return the modulus when it should
return zero. Fix this by adding a bn_reduce_once_in_place call at the
end of the operation.
Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath. That will be
fixed in the subsequent CL. (See the commented out test.)
The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions. BoringSSL
calls BN_mod_exp_mont_consttime in the following cases:
- RSA private key operations
- Primality testing, raising the witness to the odd part of p-1
- DSA keygen and key import, pub = g^priv (mod p)
- DSA signing, r = g^k (mod p)
- DH keygen, pub = g^priv (mod p)
- Diffie-Hellman, secret = peer^priv (mod p)
It is not possible in the RSA private key operation, provided p and q
are primes. If using CRT, we are working modulo a prime, so zero output
with non-zero input is impossible. If not using CRT, we work mod n.
While there are nilpotent values mod n, none of them hit zero by
exponentiating. (Both p and q would need to divide the input, which
means n divides the input.)
In primality testing, this can only be hit when the input was composite.
But as the rest of the loop cannot then hit 1, we'll correctly report it
as composite anyway.
DSA and DH work modulo a prime, where this case cannot happen.
Analysis:
This bug is the result of sloppiness with the looser bounds from "almost
Montgomery multiplication", described in
https://eprint.iacr.org/2011/239. Prior to upstream's
ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl
implemented standard Montgomery reduction (the left half of figure 3 in
the paper).
Though it did not document this, ec9cc70f7245 changed it to implement
the "almost" variant (the right half of the figure.) The difference is
that, rather than subtracting if T >= m, it subtracts if T >= R. In
code, it is the difference between something like our bn_reduce_once,
vs. subtracting based only on T's carry bit. (Interestingly, the
.Lmul_enter branch of bn_mul_mont_gather5 seems to still implement
normal reduction, but the .Lmul4x_enter branch is an almost reduction.)
That means none of the intermediate values here are bounded by m. They
are only bounded by R. Accordingly, Figure 2 in the paper ends with
step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this
step. The bn_from_montgomery call only implements step 9, AMM(h, 1).
(x86_64-mont5.pl's bn_from_montgomery only implements an almost
reduction.)
The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the
paper discusses this, but is ambiguous about the scope of its 2^(n-1) <
m < 2^n precondition. The m+1 bound appears to be unconditional:
Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a
multiple of R, and then divides by R. The output, pre-subtraction, is
thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m.
A single subtraction of m if T' >= m gives T'' < m. AMM works because
T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R
gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper,
though their formulation is more complicated to capture the word-by-word
algorithm. It's ultimately the same adjustment to T.
But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That
is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced
is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions
step 10 isn't necessary because m is a prime and the inputs are
non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime
may be called elsewhere.
Fix:
To fix this, we could add the missing step 10, but a full division would
not be constant-time. The analysis above says it could be a single
subtraction, bn_reduce_once, but then we could integrate it into
the subtraction already in plain Montgomery reduction, implemented by
uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within
bounds.
Thus, we delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.
In doing so, add comments describing these looser bounds. This includes
one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont
(MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because
MM is AMM-compatible; when passed AMM's looser inputs, it will still
produce a correct looser output.
Ideally we'd drop the "almost" reduction and stick to the more
straightforward bounds. As this only impacts the final subtraction in
each reduction, I would be surprised if it actually had a real
performance impact. But this would involve deeper change to
x86_64-mont5.pl, so I haven't tried this yet.
I believe this is basically the same bug as
https://github.com/golang/go/issues/13907 from Go.
Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This is cribbed, with perimssion, from AWS-LC. The FIPS service
indicator[1] signals when an approved service has been completed.
[1] FIPS 140-3 IG 2.4.C
Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
BN_mod_sqrt implements the Tonelli–Shanks algorithm, which requires a
prime modulus. It was written such that, given a composite modulus, it
would sometimes loop forever. This change fixes the algorithm to always
terminate. However, callers must still pass a prime modulus for the
function to have a defined output.
In OpenSSL, this loop resulted in a DoS vulnerability, CVE-2022-0778.
BoringSSL is mostly unaffected by this. In particular, this case is not
reachable in BoringSSL from certificate and other ASN.1 elliptic curve
parsing code. Any impact in BoringSSL is limited to:
- Callers of EC_GROUP_new_curve_GFp that take untrusted curve parameters
- Callers of BN_mod_sqrt that take untrusted moduli
This CL updates documentation of those functions to clarify that callers
should not pass attacker-controlled values. Even with the infinite loop
fixed, doing so breaks preconditions and will give undefined output.
Change-Id: I64dc1220aaaaafedba02d2ac0e4232a3a0648160
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51925
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Martin Kreichgauer <martinkr@google.com>
Commit-Queue: Adam Langley <agl@google.com>
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
x86_64-mont5.pl checks for both BMI1 and BMI2, because the MULX path
also uses the ANDN instruction. Some history here from upstream:
a5bb5bca52f57021a4017521c55a6b3590bbba7a, dated 2013-10-03, added the
MULX path to x86_64-mont5.pl. At the time, the cpuid check was
BMI2+ADX. (MULX comes from BMI2.)
37de2b5c1e370b493932552556940eb89922b027, dated 2013-10-09, made
BN_mod_exp_mont_consttime prefer the MULX mont5 code over the AVX2 rsaz
code, with a matching BMI2+ADX cpuid check.
8fc8f486f7fa098c9fbb6a6ae399e3c6856e0d87, dated 2016-01-25, tweaked some
code to use the ANDN instruction, from BMI1. Correspondingly, it changed
the cpuid check to be BMI1+BMI2+ADX. The BN_mod_exp_mont_consttime check
was left unchanged.
This CL fixes our version of the BN_mod_exp_mont_consttime check to
match the assembly, by also checking BMI1. (This should be a no-op.
Presumably any processor with BMI2 also has BMI1.)
Change-Id: Ib0cacc7e2be840d970460eef4dd9ded7fb24231c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51547
Reviewed-by: Adam Langley <agl@google.com>
These symbols were not marked OPENSSL_EXPORT, so they weren't really
usable externally anyway. They're also very sensitive to various build
configuration toggles, which don't always get reflected into projects
that include our headers. Move them to crypto/internal.h.
Change-Id: I79a1fcf0b24e398d75a9cc6473bae28ec85cb835
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50846
Reviewed-by: Adam Langley <agl@google.com>
OpenSSL 1.1.0 made this structure opaque. I don't think we particularly
need to make it opaque, but external code uses it. Also add
RSA_test_flags.
Change-Id: I136d38e72ec4664c78f4d1720ec691f5760090c1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50605
Reviewed-by: Adam Langley <agl@google.com>
The bulk of RSA_check_key is spent in bn_div_consttime, which is a naive
but constant-time long-division algorithm for the few places that divide
by a secret even divisor: RSA keygen and RSA import. RSA import is
somewhat performance-sensitive, so pick some low-hanging fruit:
The main observation is that, in all but one call site, the bit width of
the divisor is public. That means, for an N-bit divisor, we can skip the
first N-1 iterations of long division because an N-1-bit remainder
cannot exceed the N-bit divisor.
One minor nuisance is bn_lcm_consttime, used in RSA keygen has a case
that does *not* have a public bit width. Apply the optimization there
would leak information. I've implemented this as an optional public
lower bound on num_bits(divisor), which all but that call fills in.
Before:
Did 5060 RSA 2048 private key parse operations in 1058526us (4780.2 ops/sec)
Did 1551 RSA 4096 private key parse operations in 1082343us (1433.0 ops/sec)
After:
Did 11532 RSA 2048 private key parse operations in 1084145us (10637.0 ops/sec) [+122.5%]
Did 3542 RSA 4096 private key parse operations in 1036374us (3417.7 ops/sec) [+138.5%]
Bug: b/192484677
Change-Id: I893ebb8886aeb8200a1a365673b56c49774221a2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49106
Reviewed-by: Adam Langley <agl@google.com>
See also f8fc0e35e0b1813af15887d42e17b7d5537bb86c from upstream, though
our BN_divs have diverged slightly.
Change-Id: I49fa4f0a5c730d34e6f41f724f1afe3685470712
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48426
Reviewed-by: Adam Langley <agl@google.com>
This comment dates to SSLeay. It appears to be describing the
incremental trial division strategy where they would pick a starting
candidate, compute moduli by small primes, and then update by
incrementing the candidate and saved moduli instead of dividing from
scratch. We use a simpler rejection sampling strategy.
Change-Id: If2203d616f2b1f632bcd7033ceb60a83d1b75674
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48047
Reviewed-by: Adam Langley <agl@google.com>
Because file names are not enclosed in quotation marks in the open call.
https://bugs.chromium.org/p/boringssl/issues/detail?id=415
```
cmake --build "C:\Projects\ Extern\Visual C++ 2015\x64 Debug\Build\BoringSSL\."
[9/439] Generating rdrand-x86_64.asm
FAILED: crypto/fipsmodule/rdrand-x86_64.asm
cmd.exe /C "cd /D "C:\Projects\ Extern\Visual C++ 2015\x64 Debug\Build\BoringSSL\crypto\fipsmodule" && "C:\Program Files\CMake\bin\cmake.exe" -E make_directory . && C:\Perl64\bin\perl.exe "C:/Projects/ Extern/Source/BoringSSL/crypto/fipsmodule/rand/asm/rdrand-x86_64.pl" nasm rdrand-x86_64.asm"
Can't open perl script "C:/Projects/": No such file or directory
error closing STDOUT at C:/Projects/ Extern/Source/BoringSSL/crypto/fipsmodule/rand/asm/rdrand-x86_64.pl line 87.
ninja: build stopped: subcommand failed.
```
Bug: 415
Change-Id: I83c4a460689b9adeb439425ad390322ae8b2002a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/47884
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This invovles a |2048|^|225| modexp, which is far from ideal, but is now
required in FIPS mode.
Change-Id: Id7384b4ba92aa74e971231bc44fa0f10434d18e2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45085
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
We clear all heap memory on free now, thus the difference between these
functions is quite small. There are some differences though:
Firstly, BN_clear_free will attempt to zero out static limb data. But
static data is probably read-only and thus trying to zero it will crash.
Secondly it will try to zero out the BIGNUM structure itself. But either
it's on the heap, and will be zeroed anyway, or else it's on the stack,
and we don't try and clear the stack in general because the compiler is
duplicating bits of it at will anyway.
Change-Id: I8a07385a102cfd308b555432942225c25eb7c12d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45084
Reviewed-by: David Benjamin <davidben@google.com>
Windows on Arm (WoA) builds are currently using the C implementations
of the various functions within BoringSSL. This patch enables feature
detection for the Neon and hardware crypto optimizations, and updates
the perl script to generate AArch64 .S files for WoA.
Note these files use GNU assembler syntax (specifically tested with
Clang assembler), not armasm.
Change-Id: Id8841f4db0498ec16215095a4e6bd60d427cd54b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/43304
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
If using precompiled headers then this is needed otherwise bn/internal.h
doesn't have a definition for BN_ULONG etc.
Change-Id: I41b331465abae7108f255722a156d2ffb3016ba3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44604
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This change adds optional support for
- Armv8.3-A Pointer Authentication (PAuth) and
- Armv8.5-A Branch Target Identification (BTI)
features to the perl scripts.
Both features can be enabled with additional compiler flags.
Unless any of these are enabled explicitly there is no code change at
all.
The extensions are briefly described below. Please read the appropriate
chapters of the Arm Architecture Reference Manual for the complete
specification.
Scope
-----
This change only affects generated assembly code.
Armv8.3-A Pointer Authentication
--------------------------------
Pointer Authentication extension supports the authentication of the
contents of registers before they are used for indirect branching
or load.
PAuth provides a probabilistic method to detect corruption of register
values. PAuth signing instructions generate a Pointer Authentication
Code (PAC) based on the value of a register, a seed and a key.
The generated PAC is inserted into the original value in the register.
A PAuth authentication instruction recomputes the PAC, and if it matches
the PAC in the register, restores its original value. In case of a
mismatch, an architecturally unmapped address is generated instead.
With PAuth, mitigation against ROP (Return-oriented Programming) attacks
can be implemented. This is achieved by signing the contents of the
link-register (LR) before it is pushed to stack. Once LR is popped,
it is authenticated. This way a stack corruption which overwrites the
LR on the stack is detectable.
The PAuth extension adds several new instructions, some of which are not
recognized by older hardware. To support a single codebase for both pre
Armv8.3-A targets and newer ones, only NOP-space instructions are added
by this patch. These instructions are treated as NOPs on hardware
which does not support Armv8.3-A. Furthermore, this patch only considers
cases where LR is saved to the stack and then restored before branching
to its content. There are cases in the code where LR is pushed to stack
but it is not used later. We do not address these cases as they are not
affected by PAuth.
There are two keys available to sign an instruction address: A and B.
PACIASP and PACIBSP only differ in the used keys: A and B, respectively.
The keys are typically managed by the operating system.
To enable generating code for PAuth compile with
-mbranch-protection=<mode>:
- standard or pac-ret: add PACIASP and AUTIASP, also enables BTI
(read below)
- pac-ret+b-key: add PACIBSP and AUTIBSP
Armv8.5-A Branch Target Identification
--------------------------------------
Branch Target Identification features some new instructions which
protect the execution of instructions on guarded pages which are not
intended branch targets.
If Armv8.5-A is supported by the hardware, execution of an instruction
changes the value of PSTATE.BTYPE field. If an indirect branch
lands on a guarded page the target instruction must be one of the
BTI <jc> flavors, or in case of a direct call or jump it can be any
other instruction. If the target instruction is not compatible with the
value of PSTATE.BTYPE a Branch Target Exception is generated.
In short, indirect jumps are compatible with BTI <j> and <jc> while
indirect calls are compatible with BTI <c> and <jc>. Please refer to the
specification for the details.
Armv8.3-A PACIASP and PACIBSP are implicit branch target
identification instructions which are equivalent with BTI c or BTI jc
depending on system register configuration.
BTI is used to mitigate JOP (Jump-oriented Programming) attacks by
limiting the set of instructions which can be jumped to.
BTI requires active linker support to mark the pages with BTI-enabled
code as guarded. For ELF64 files BTI compatibility is recorded in the
.note.gnu.property section. For a shared object or static binary it is
required that all linked units support BTI. This means that even a
single assembly file without the required note section turns-off BTI
for the whole binary or shared object.
The new BTI instructions are treated as NOPs on hardware which does
not support Armv8.5-A or on pages which are not guarded.
To insert this new and optional instruction compile with
-mbranch-protection=standard (also enables PAuth) or +bti.
When targeting a guarded page from a non-guarded page, weaker
compatibility restrictions apply to maintain compatibility between
legacy and new code. For detailed rules please refer to the Arm ARM.
Compiler support
----------------
Compiler support requires understanding '-mbranch-protection=<mode>'
and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT
and __ARM_FEATURE_PAC_DEFAULT). The current state is the following:
-------------------------------------------------------
| Compiler | -mbranch-protection | Feature macros |
+----------+---------------------+--------------------+
| clang | 9.0.0 | 11.0.0 |
+----------+---------------------+--------------------+
| gcc | 9 | expected in 10.1+ |
-------------------------------------------------------
Available Platforms
------------------
Arm Fast Model and QEMU support both extensions.
https://developer.arm.com/tools-and-software/simulation-models/fast-modelshttps://www.qemu.org/
Implementation Notes
--------------------
This change adds BTI landing pads even to assembly functions which are
likely to be directly called only. In these cases, landing pads might
be superfluous depending on what code the linker generates.
Code size and performance impact for these cases would be negligble.
Interaction with C code
-----------------------
Pointer Authentication is a per-frame protection while Branch Target
Identification can be turned on and off only for all code pages of a
whole shared object or static binary. Because of these properties if
C/C++ code is compiled without any of the above features but assembly
files support any of them unconditionally there is no incompatibility
between the two.
Useful Links
------------
To fully understand the details of both PAuth and BTI it is advised to
read the related chapters of the Arm Architecture Reference Manual
(Arm ARM):
https://developer.arm.com/documentation/ddi0487/latest/
Additional materials:
"Providing protection for complex software"
https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software
Arm Compiler Reference Guide Version 6.14: -mbranch-protection
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en
Arm C Language Extensions (ACLE)
https://developer.arm.com/docs/101028/latest
Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42084
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>