I don't think these are all UB by C's rules, but it's easier not to
think about the pointers. Still more to go, but these were some easy
ones.
Bug: 301
Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
OpenSSL 1.1.1 added HKDF support, but by sticking it into
EVP_PKEY_derive, the API meant for Diffie-Hellman-like primitives.
Implement it for OpenSSL compatibility.
This does unfortunately mean anything using EVP now pulls in HKDF. HKDF
isn't much code, but we should make EVP more static-linker-friendly.
(Filed https://crbug.com/boringssl/497)
Change-Id: I90b9b0d918129829eb36ba9d50ff4d8580346ff0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52829
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/52825 lost a
tmp.width = top line. Without it, tmp.width was set by
bn_one_to_montgomery. Since we always size modular arithmetic by the
modulus, tmp.width (and am.width) will actually always be top, and
there's actually no need to zero pad it.
We don't capture this in the type system or BIGNUM width convention, so
better to set the width explicitly. The original code did it at the end,
but I think doing it right when we zero pad it is better, as that's when
the size gets set.
But we can go a step further. The manual zero padding code came from
OpenSSL, which still had the bn_correct_top invariant. Our BIGNUMs are
resizable, so just call bn_resize_words, immediately after the
computation.
(bn_resize_words will not reallocate the data because the BIGNUMs have
the STATIC_DATA flag set. bn_wexpand will internally allow expanding up
to dmax, or top.)
Change-Id: I2403afa7381b8a407615c6730fba9edaa41125c6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52906
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(It's a pain because we don't have it setup in CMake, but perhaps we
should have a builder for the configuration that doesn't have bcm.c.)
Change-Id: Ic408f0a86c9d42346244d6a7b7e9e664b58fc70c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52845
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Both implementations need to compute the first 32 powers of a. There's a
commented out naive version in rsaz_exp.c that claims to be smaller, but
1% slower. (It doesn't use squares when it otherwise could.)
Instead, we can write out the square-based strategy as a loop. (I wasn't
able to measure a difference between any of the three versions, but this
one's compact enough and does let us square more and gather5 less.)
Change-Id: I7015f2a78584cd97f29b54d0007479bdcc3a01ba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52828
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The unrolled loops appear to have negligible perf impact:
Before:
Did 18480 RSA 2048 signing operations in 10005085us (1847.1 ops/sec)
Did 2720 RSA 4096 signing operations in 10056337us (270.5 ops/sec)
After:
Did 18480 RSA 2048 signing operations in 10012218us (1845.7 ops/sec) [-0.1%]
Did 2700 RSA 4096 signing operations in 10003972us (269.9 ops/sec) [-0.2%]
Change-Id: I29073c373a03a9798f6e04016626e6ab910e893a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52826
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
RSAZ has a very similar bug to mont5 from
https://boringssl-review.googlesource.com/c/boringssl/+/52825 and may
return the modulus when it should return zero. As in that CL, there is
no security impact on our cryptographic primitives.
RSAZ is described in the paper "Software Implementation of Modular
Exponentiation, Using Advanced Vector Instructions Architectures".
The bug comes from RSAZ's use of "NRMM" or "Non Reduced Montgomery
Multiplication". This is like normal Montgomery multiplication, but
skips the final subtraction altogether (whereas mont5's AMM still
subtracts, but replaces MM's tigher bound with just the carry bit). This
would normally not be stable, but RSAZ picks a larger R > 4M, and
maintains looser bounds for modular arithmetic, a < 2M.
Lemma 1 from the paper proves that NRMM(a, b) preserves this 2M bound.
It also claims NRMM(a, 1) < M. That is, conversion out of Montgomery
form with NRMM is fully reduced. This second claim is wrong. The proof
shows that NRMM(a, 1) < 1/2 + M, which only implies NRMM(a, 1) <= M, not
NRMM(a, 1) < M. RSAZ relies on this to produce a reduced output (see
Figure 7 in the paper).
Thus, like mont5 with AMM, RSAZ may return the modulus when it should
return zero. Fix this by adding a bn_reduce_once_in_place call at the
end of the operation.
Change-Id: If28bc49ae8dfbfb43bea02af5ea10c4209a1c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52827
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath. That will be
fixed in the subsequent CL. (See the commented out test.)
The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions. BoringSSL
calls BN_mod_exp_mont_consttime in the following cases:
- RSA private key operations
- Primality testing, raising the witness to the odd part of p-1
- DSA keygen and key import, pub = g^priv (mod p)
- DSA signing, r = g^k (mod p)
- DH keygen, pub = g^priv (mod p)
- Diffie-Hellman, secret = peer^priv (mod p)
It is not possible in the RSA private key operation, provided p and q
are primes. If using CRT, we are working modulo a prime, so zero output
with non-zero input is impossible. If not using CRT, we work mod n.
While there are nilpotent values mod n, none of them hit zero by
exponentiating. (Both p and q would need to divide the input, which
means n divides the input.)
In primality testing, this can only be hit when the input was composite.
But as the rest of the loop cannot then hit 1, we'll correctly report it
as composite anyway.
DSA and DH work modulo a prime, where this case cannot happen.
Analysis:
This bug is the result of sloppiness with the looser bounds from "almost
Montgomery multiplication", described in
https://eprint.iacr.org/2011/239. Prior to upstream's
ec9cc70f72454b8d4a84247c86159613cee83b81, I believe x86_64-mont5.pl
implemented standard Montgomery reduction (the left half of figure 3 in
the paper).
Though it did not document this, ec9cc70f7245 changed it to implement
the "almost" variant (the right half of the figure.) The difference is
that, rather than subtracting if T >= m, it subtracts if T >= R. In
code, it is the difference between something like our bn_reduce_once,
vs. subtracting based only on T's carry bit. (Interestingly, the
.Lmul_enter branch of bn_mul_mont_gather5 seems to still implement
normal reduction, but the .Lmul4x_enter branch is an almost reduction.)
That means none of the intermediate values here are bounded by m. They
are only bounded by R. Accordingly, Figure 2 in the paper ends with
step 10: REDUCE h modulo m. BN_mod_exp_mont_consttime is missing this
step. The bn_from_montgomery call only implements step 9, AMM(h, 1).
(x86_64-mont5.pl's bn_from_montgomery only implements an almost
reduction.)
The impact depends on how unreduced AMM(h, 1) can be. Remark 1 of the
paper discusses this, but is ambiguous about the scope of its 2^(n-1) <
m < 2^n precondition. The m+1 bound appears to be unconditional:
Montgomery reduction ultimately adds some 0 <= Y < m*R to T, to get a
multiple of R, and then divides by R. The output, pre-subtraction, is
thus less than m + T/R. MM works because T < mR => T' < m + mR/R = 2m.
A single subtraction of m if T' >= m gives T'' < m. AMM works because
T < R^2 => T' < m + R^2/R = m + R. A single subtraction of m if T' >= R
gives T'' < R. See also Lemma 1, Section 3 and Section 4 of the paper,
though their formulation is more complicated to capture the word-by-word
algorithm. It's ultimately the same adjustment to T.
But in AMM(h, 1), T = h*1 = h < R, so AMM(h, 1) < m + R/R = m + 1. That
is, AMM(h, 1) <= m. So the only case when AMM(h, 1) isn't fully reduced
is if it outputs m. Thus, our limited impact. Indeed, Remark 1 mentions
step 10 isn't necessary because m is a prime and the inputs are
non-zero. But that doesn't apply here because BN_mod_exp_mont_consttime
may be called elsewhere.
Fix:
To fix this, we could add the missing step 10, but a full division would
not be constant-time. The analysis above says it could be a single
subtraction, bn_reduce_once, but then we could integrate it into
the subtraction already in plain Montgomery reduction, implemented by
uppercase BN_from_montgomery. h*1 = h < R <= m*R, so we are within
bounds.
Thus, we delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.
In doing so, add comments describing these looser bounds. This includes
one subtlety that BN_mod_exp_mont_consttime actually mixes bn_mul_mont
(MM) with bn_mul_mont_gather5/bn_power5 (AMM). But this is fine because
MM is AMM-compatible; when passed AMM's looser inputs, it will still
produce a correct looser output.
Ideally we'd drop the "almost" reduction and stick to the more
straightforward bounds. As this only impacts the final subtraction in
each reduction, I would be surprised if it actually had a real
performance impact. But this would involve deeper change to
x86_64-mont5.pl, so I haven't tried this yet.
I believe this is basically the same bug as
https://github.com/golang/go/issues/13907 from Go.
Change-Id: I06f879777bb2ef181e9da7632ec858582e2afa38
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52825
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
For now, it contains a call to set the service indicator so it should
live in the module. In term it would be nice to move it back out and
have the service indicator set in RSA and ECDSA functions themselves
once the ECDSA functions can take an indicator of the hash function
used.
Change-Id: I2a3c262f66b1881a96ae3e49784a0dc9fc8c4589
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52705
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
clang-format gets very confused by the comments in these tables. (The
comments seem to have already gotten a little messed up from upstream's
reformatted.) Reformat them ahead of time. I removed the tag2str number
comments as they aren't really doing much good at this point.
Also remove the last entry in tag2bits because it's not actually used.
ASN1_tag2bit only reads the first 31 entries.
Change-Id: If50770fd79b9d6ccab5558d24b0ee3a27c81a452
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52731
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
The original implementation uses a table generated by a Perl script,
and then relies on some subset of ASN1_STRFLGS_* constants overlapping
with CHARTYPE_* constants, while masking off the ones that don't align.
Allocating ASN1_STRFLGS_* constants is already complex with the
XN_FLAG_* interaction. Avoid the additional CHARTYPE_* interaction by
just writing out what it's recognizing in code. If you ignore
CHARTYPE_PRINTABLESTRING (which is unused), that table is just
recognizing 9 characters anyway.
Also this gets charmap.h out of the way so I can clang-format every file
in here without having to constantly exclude it.
Change-Id: I73f31324e4b8a815887afba459e50ed091a9f999
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52729
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Adding the ending macros to StatementMacros tells clang-format the macro
terminates a statement. Adding trailing commas in the middle keeps it
from trying to bundle the curly brace with the next statement.
Also add a few other trailing commas that clang-format otherwise indents
awkwardly.
Change-Id: I0b2ba9cf07bc775649fa1e92de3e5bb2e2b0b20b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52728
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
If we're to have any hope of fixing EVP_CIPHER_CTX's calling convention, we
need to be able to change the shape of its method table.
Looking back, it looks like we exported this in
https://boringssl-review.googlesource.com/4330, for OpenSSH. I don't
remember exactly what OpenSSH was doing, but I see in this commit, they
removed a bunch of custom EVP_CIPHERs which would definitely have
required an exported EVP_CIPHER struct:
cdccebdf85
That's been gone for a while now, so hopefully we can hide it again. (If
a project needs a cipher not implemented by OpenSSL, it's not strictly
necessarily to make a custom EVP_CIPHER. It might be convenient to reuse
the abstraction, but you can always just call your own APIs directly.)
Update-Note: EVP_CIPHER is now opaque. Use accessors instead.
Bug: 494
Change-Id: I9344690c3cfe7d19d6ca12fb66484ced57dbe869
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52725
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
While C allows function pointer casts, it is UB to call a function with
a different type than its actual type signature. That is, even though
`void f(int *)` and `void g(void *)` have the same ABI, it is UB to
cast `f` to a `void(*)(void *)` and then call it through that pointer.
Clang CFI will try to enforce this rule.
The recent CL to call X509_print in tests revealed that all the i2? and
?2i callbacks in X509V3_EXT_METHODs were implemented with functions of
the wrong type, out of some combination of missing consts and void*
turned into T*.
This CL fixes this. Where the function wasn't exported, or had no
callers, I just fixed the function itself. Where it had extension
callers, I added a wrapper function with a void* type.
I'm not positive whether the wrappers are the right call. On the one
hand, keeping the exported functions as-is is more type-safe and more
OpenSSL-compatible. However, most (but not all) uses of these are in
other code defining X509V3_EXT_METHODs themselves, so the void*
signature is more correct for them too. And the functions have a type
signature meant for X509V3_EXT_METHOD, complete with method pointer.
I've gone with leaving the exported ones as-is for now. Probably the
right answer anyway is to migrate the external callers, of either type
signature.
Change-Id: Ib8f2995cbd890221eaa9ac864a7e553cb6711901
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52686
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
This has no callers, and seems to be practically unusable. The only way
to set an X509_CRL_METHOD is X509_CRL_set_default_method, which is not
thread-safe and globally affects the CRL implementation across the
application.
The comment says it's to handle large CRLs, so lots of processes don't
have to store the same CRL in memory. As far as I can tell,
X509_CRL_METHOD cannot be used to help with this. It doesn't swap out
storage of the CRL, just signature verification and lookup into it. But
by the time we call into X509_CRL_METHOD, the CRL has already been
downloaded and the data stored on the X509_CRL structure. (Perhaps this
made more sense before the structure was made opaque?)
Update-Note: APIs relating to X509_CRL_METHOD are removed.
Change-Id: Ia5befa2a0e4f4416c2fb2febecad99fa31c1c6ac
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52687
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We have no EVP_MDs with type NID_ecdsa_with_SHA1 (that's a remnant of
the old signature algorithm EVP_MDs). Also there's no sense in calling
EVP_MD_type or performing the cast five times.
Change-Id: I7ea60d80059420b01341accbadf9854b4c3fd1b8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52685
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This is cribbed, with perimssion, from AWS-LC. The FIPS service
indicator[1] signals when an approved service has been completed.
[1] FIPS 140-3 IG 2.4.C
Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/51635 switched
the serial number check to ASN1_INTEGER_get_uint64, but as that function
actually treats overflow as an error, it adds to the error queue and we
need to clear it.
See also b/231880827, though whether that is a red herring or the cause,
I'm not sure.
Change-Id: Ibd7e9369c3455898fa3411b7a079ce21b37c586c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52648
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
These definitions are to get access to getaddrinfo() and gmtime_r()
when using glibc. This in turn conflicts with other places (which
would have these things in their libc anyway) where using these
feature flags turns off C11 functionality we would like to use.
Bug:490
Change-Id: I66fdb7292cda788df19508d99e7303ed0d4f4bdd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52545
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This aligns X509_REQ's and X509_CRL's parsers to the changes already
made with X509; we reject invalid versions and check that extensions are
only with the corresponding version. For now, we still allow X509v1 CRLs
with an explicit version, matching certificates. (The DEFAULT question
is moot for X509_REQ because CSRs always encode their version, see RFC
2986.)
In addition to rejecting garbage, this allows for a more efficient
representation once we stop using the table-based parser: X509 and
X509_CRL can just store a small enum. X509_REQ doesn't need to store
anything because the single version is information-less.
Update-Note: Invalid CRL and CSR versions will no longer be accepted.
X509_set_version, etc., no longer allow invalid versions.
Fixed: 467
Change-Id: I33f3aec747d8060ab80e0cbb8ddf97672e07642c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52605
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The manual construction of the version integer is odd. The default is
already zero, and as of
https://boringssl-review.googlesource.com/c/boringssl/+/51632, we've
settled on the empty string as the ASN1_INTEGER representation of zero.
But there don't seem to be any uses of this function, so just remove it.
Update-Note: Removed seemingly unused public API.
Change-Id: I75f8bcdadb8ffefb0b2da0fcb0a87a8cb6398f70
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52585
Reviewed-by: Adam Langley <agl@google.com>
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
For now, the tests assert the existing behavior of X509_NAME_print, but
there are several bugs in it.
Change-Id: I9bc211a880ea48f7f756650dbe1f982bc1ec689d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52366
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Between the lookup table, the multiple layers of reuse of the "type"
variable, it is a little hard to follow what's going on with
ASN1_STRING_print_ex. Replace the lookup table with a switch-case
(implicitly handles the bounds check, and we can let the compiler figure
out the best spelling). Then, rather than returning a "character width",
which doen't represent UTF-8, just use the already-defined MBSTRING_*
constants.
(These changes should be covered by the existing ASN1Test.StringPrintEx
test.)
Change-Id: Ie3b2557bfae0f65db969e90cd0c76bc8ade963d4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52365
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Inspired by Joel Sing's work in libre.
Change-Id: I17267af926b7d42472f7dae3205fda9aabdfa73d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52385
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CMake versions newer than ~3.1x automatically determine the subdirectory under CMAKE_INSTALL_PREFIX using the type of the installed target. Older versions need this to be manually computed using the GNUInstallDirs library.
Since we override the CMAKE_INSTALL_PREFIX default, this just controls
the internal layout of the install/ directory generated underneath the
boringssl checkout.
Bug: 488
Change-Id: I97b02006301e463bb0cfd54acb2b27656484cc85
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52345
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We no longer use a weak hash for certificate comparisons. There
is no need to do extra work when certificates are the same.
Change-Id: I3b4b295122b289ae389bce2245b8348562700855
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52346
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
(Tweaking pending CLs now upsets the tooling so doing this as a
follow-up CL instead.)
Change-Id: I78efc5cb72de7b41d57769bb3958c20167de69b3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52325
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This work continues on top of the CL opened by Vlad Krasnov
(https://boringssl-review.googlesource.com/c/boringssl/+/44364). The
CL was thoroughly reviewed by David Benjamin but not merged due to
some outstanding comments which this work addresses:
- The flag check when doing the final reduction in poly1305 was
changed from `eq` to `cs`
- The CFI prologues and epilogues of open/seal were modified as
recommended by David.
- Added Pointer Authentication instruction to the functions that are
exported from the assembly code as pointed out by David.
Testing:
- The current tests against ChaCha20-Poly1305 continue to pass.
- More test vectors were produced using a Python script to try and
prove that having `eq` instead of `cs` was a bug. They passed as
well, but didn't result in the most significant word being
non-zero after the reduction, which would have highlighted the
bug. An argument about why it's unlikely to find the vector is
detailed below.
- `objdump -W|Wf|WF` was used to confirm the value of the CFA and the
locations of the registers relative to the CFA were as expected. See
https://www.imperialviolet.org/2017/01/18/cfi.html.
Performance:
| Size | Before (MB/s) | After (MB/s) | Improvement |
| 16 bytes | 30.5 | 43.3 | 1.42x |
| 256 bytes | 220.7 | 361.5 | 1.64x |
| 1350 bytes | 285.9 | 639.4 | 2.24x |
| 8192 bytes | 329.6 | 798.3 | 2.42x |
| 16384 bytes | 331.9 | 814.9 | 2.46x |
Explanation of the unlikelihood of finding a test vector:
* the modulus is in t2:t1:t0 = 3 : FF..FF : FF..FB, each being a 64 bit
word; i.e. t2 = 3, t1 = all 1s.
* acc2 <= 4 after the previous reduction.
* It is highly likely to have borrow = 1 from acc1 - t1 since t1 is
all FFs.
* So for almost all test vectors we have acc2 <= 4 and borrow = 1,
thus (t2 = acc2 - t2 - borrow) will be 0 whenever acc >
modulus. **It would be highly unlikely to find such a test vector
with t2 > 0 after that final reduction:** Trying to craft that
vector requires having acc and r of high values before their
multiplication, yet ensuring that after the reduction (see Note) of
their product, the resulting value of the accumulator has t2 = 4,
all 1s in t1 and most of t0 so that no borrow occurs from acc1:acc0
- t1:t0.
* Note: the reduction is basically carried by folding over the top
64+62 bits once, then folding them again shifted left by 2,
resulting in adding 5 times those bits.
Change-Id: If7d86b7a9b74ec3615ac2d7a97f80100dbfaee7f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51885
Reviewed-by: Adam Langley <alangley@gmail.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
strcasecmp is locale-sensitive, which can cause some mishaps.
This CL should be a no-op, because this call is only used on Android,
and bionic's strcasecmp seems to be ASCII-only. But using
OPENSSL_strcasecmp everywhere is easier to reason about.
Change-Id: Iecf9bc4da1bb3a4ab87b1e8b1d7f6f6c6e44aceb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52305
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
directly. This removes some temporary variables that would only be
used to hold ctx->verify_cb.
Change-Id: Id8f61c30bc74bb49757085c68baa4e8eec06b1d0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52285
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
MSVC is a little behind, but otherwise we should be able to assume C11
support in all our compilers. The only C99 builds should just be stale
build files. Such consumers are leaving performance on the table, by
using the worse refcounting implementation.
For now, don't require it in public headers. Android's build is still
defaulting to C99, which means requiring C11 will be disruptive. We can
try the public headers after that's fixed.
Update-Note: If the build fails with an error about C11, remove -std=c99
or -std=gnu99 from your build. Refcounting will get faster.
Change-Id: I2ec6f7d7acc026a451851d0c38f60c14bae6b00f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52247
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Now that we've dropped MSVC 2015, I believe we can rely on C++14 (which
is now seven years old). This switches the build to require C++14. I've
gone ahead and switched code in both public headers and within the
library, but if the public headers are a problem, we can revert those
separately.
C++14 doesn't get us quite as much as C++17, but see if we can get to
C++14 first. Still, std::enable_if_t and the less restricted constexpr
are nice wins.
Update-Note: C++14 is now required to build BoringSSL. If the build
breaks, make sure your compiler is C++14-capable and is not passing
-std=c++11. If this is causing problems for your project, let us know.
Change-Id: If03a88e3f8a11980180781f95b806e7f3c3cb6c3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52246
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
[UNIVERSAL 0] is reserved by X.680 for the encoding to use. BER uses
this to encode indefinite-length EOCs, but it is possible to encode it
in a definite-length element or in a non-EOC form (non-zero length, or
constructed).
Whether we accept such encodings is normally moot: parsers will reject
the tag as unsuitable for the type. However, the ANY type matches all
tags. Previously, we would allow this, but crypto/asn1 has some ad-hoc
checks for unexpected EOCs, in some contexts, but not others.
Generalize this check to simply rejecting [UNIVERSAL 0] in all forms.
This avoids a weird hole in the abstraction where tags are sometimes
representable in BER and sometimes not. It also means we'll preserve
this check when migrating parsers from crypto/asn1.
Update-Note: There are two kinds of impacts I might expect from this
change. The first is BER parsers might be relying on the CBS DER/BER
element parser to pick up EOCs, as our ber.c does. This should be caught
by the most basic unit test and can be fixed by detecting EOCs
externally.
The second is code might be trying to parse "actual" elements with tag
[UNIVERSAL 0]. No actual types use this tag, so any non-ANY field is
already rejecting such inputs. However, it is possible some input has
this tag in a field with type ANY. This CL will cause us to reject that
input. Note, however, that crypto/asn1 already rejects unexpected EOCs
inside sequences, so many cases were already rejected anyway. Such
inputs are also invalid as the ANY should match some actual, unknown
ASN.1 type, and that type cannot use the reserved tag.
Fixed: 455
Change-Id: If42cacc01840439059baa0e67179d0f198234fc4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52245
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This creates an install directory under the top level source directory.
The install contains a CMake config file that produces variables and
targets compatible with FindOpenSSL, or the directory can be scanned by
FindOpenSSL via -DOPEN_SSL_ROOT. This allows using BoringSSL with
third-party dependencies that find an SSL implementation via CMake.
Change-Id: Iffeac64b9cced027d549486c98a6cd9721415454
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52205
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
The ARMv8 assembly code in this commit is mostly taken from OpenSSL's `ecp_nistz256-armv8.pl` at 19e277dd19/crypto/ec/asm/ecp_nistz256-armv8.pl (see Note 1), adapting it to the implementation in p256-x86_64.c.
Most of the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly implementation with the exception of the functions:
- ecp_nistz256_select_w5
- ecp_nistz256_select_w7
An implementation for these functions was added.
Summary of modifications to the imported code:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* `ecp_nistz256_add` is removed because it was conflicting during the static build with the function of the same name in p256-nistz.c. The latter calls another assembly function, `ecp_nistz256_point_add`.
* `__ecp_nistz256_add` renamed to `__ecp_nistz256_add_to` to avoid the conflict with the function `ecp_nistz256_add` during the static build.
* l. 924 `add sp,sp,#256` the calculation of the constant, 32*(12-4), is not left for the assembler to perform.
Other modifications:
* `beeu_mod_inverse_vartime()` was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.
* The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.
* Updated `delocate.go` and `delocate.peg` to handle the offset calculation in the assembly instructions.
* Regenerated `delocate.peg.go`.
Notes:
1- The last commit in the history of the file is in master only, the previous commits are in OpenSSL 3.0.1
2- This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
Testing the performance on Armv8 platform using -DCMAKE_BUILD_TYPE=Release:
Before:
```
Did 2596 ECDH P-256 operations in 1093956us (2373.0 ops/sec)
Did 6996 ECDSA P-256 signing operations in 1044630us (6697.1 ops/sec)
Did 2970 ECDSA P-256 verify operations in 1084848us (2737.7 ops/sec)
```
After:
```
Did 6699 ECDH P-256 operations in 1091684us (6136.4 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1012944us (19744.4 ops/sec)
Did 7051 ECDSA P-256 verify operations in 1060000us (6651.9 ops/sec)
```
Change-Id: I9fdef12db365967a9264b5b32c07967b55ea48bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Polynomials have more elements than strictly needed in order to better
align them for when vector instructions are available. The
|poly_mul_vec| path is sensitive to this and so zeros out the extra
elements before starting work. That means that it'll write to a const
pointer, even if it'll always write the same value. However, while this
code is intended for ephemeral uses, we have a case where this is a data
race that upsets tools.
Therefore this change always normalises the polynomials, even if it's
running in a configuration that doesn't care.
Change-Id: I83eb04c5ce7c4e7ca959f2dd7fbd3efbe306d989
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52186
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
There are paperwork reasons why it's useful to use the same hash
function in all cases. Thus unify on SHA-256 because contexts where
SHA-512 is faster, are faster overall and thus less sensitive.
Change-Id: I7a782a3adba4ace3257313a24dc8bc213b9d64ec
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52165
Reviewed-by: David Benjamin <davidben@google.com>
This is causing -Werror failures with recent Clang builds.
Change-Id: I0f82d99ae51d02037f5f614a8fadfb3efc0da2da
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52145
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>