(Tweaking pending CLs now upsets the tooling so doing this as a
follow-up CL instead.)
Change-Id: I78efc5cb72de7b41d57769bb3958c20167de69b3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52325
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This work continues on top of the CL opened by Vlad Krasnov
(https://boringssl-review.googlesource.com/c/boringssl/+/44364). The
CL was thoroughly reviewed by David Benjamin but not merged due to
some outstanding comments which this work addresses:
- The flag check when doing the final reduction in poly1305 was
changed from `eq` to `cs`
- The CFI prologues and epilogues of open/seal were modified as
recommended by David.
- Added Pointer Authentication instruction to the functions that are
exported from the assembly code as pointed out by David.
Testing:
- The current tests against ChaCha20-Poly1305 continue to pass.
- More test vectors were produced using a Python script to try and
prove that having `eq` instead of `cs` was a bug. They passed as
well, but didn't result in the most significant word being
non-zero after the reduction, which would have highlighted the
bug. An argument about why it's unlikely to find the vector is
detailed below.
- `objdump -W|Wf|WF` was used to confirm the value of the CFA and the
locations of the registers relative to the CFA were as expected. See
https://www.imperialviolet.org/2017/01/18/cfi.html.
Performance:
| Size | Before (MB/s) | After (MB/s) | Improvement |
| 16 bytes | 30.5 | 43.3 | 1.42x |
| 256 bytes | 220.7 | 361.5 | 1.64x |
| 1350 bytes | 285.9 | 639.4 | 2.24x |
| 8192 bytes | 329.6 | 798.3 | 2.42x |
| 16384 bytes | 331.9 | 814.9 | 2.46x |
Explanation of the unlikelihood of finding a test vector:
* the modulus is in t2:t1:t0 = 3 : FF..FF : FF..FB, each being a 64 bit
word; i.e. t2 = 3, t1 = all 1s.
* acc2 <= 4 after the previous reduction.
* It is highly likely to have borrow = 1 from acc1 - t1 since t1 is
all FFs.
* So for almost all test vectors we have acc2 <= 4 and borrow = 1,
thus (t2 = acc2 - t2 - borrow) will be 0 whenever acc >
modulus. **It would be highly unlikely to find such a test vector
with t2 > 0 after that final reduction:** Trying to craft that
vector requires having acc and r of high values before their
multiplication, yet ensuring that after the reduction (see Note) of
their product, the resulting value of the accumulator has t2 = 4,
all 1s in t1 and most of t0 so that no borrow occurs from acc1:acc0
- t1:t0.
* Note: the reduction is basically carried by folding over the top
64+62 bits once, then folding them again shifted left by 2,
resulting in adding 5 times those bits.
Change-Id: If7d86b7a9b74ec3615ac2d7a97f80100dbfaee7f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51885
Reviewed-by: Adam Langley <alangley@gmail.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
This matches our other free functions.
Fixed: 473
Change-Id: Ie147995c2f5b429f78e95cfc9a08ed54181af94e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51005
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
These symbols were not marked OPENSSL_EXPORT, so they weren't really
usable externally anyway. They're also very sensitive to various build
configuration toggles, which don't always get reflected into projects
that include our headers. Move them to crypto/internal.h.
Change-Id: I79a1fcf0b24e398d75a9cc6473bae28ec85cb835
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50846
Reviewed-by: Adam Langley <agl@google.com>
Node seems uncommonly sensitive to this, so let's write these functions
in a way that stays in sync and test this. See also
https://boringssl-review.googlesource.com/c/boringssl/+/49585
This does incur a cost across all BoringSSL consumers that use these
functions: as a result of Node indiscriminately exposing every cipher,
we end up pulling more and more ciphers into these getters. But that
ship sailed long ago, so, instead, document that EVP_get_cipherby*
should not be used by size-conscious callers.
EVP_get_digestby* probably should have the same warning, but I've left
it alone for now because we don't quite have the same proliferation of
digests as ciphers. (Though there are things in there, like MD4, that
ought to be better disconnected.)
Change-Id: I61ca406c146279bd05a52bed6c57200d1619c5da
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49625
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
This adds a check to EVP_get_cipherbyname which ensures that name
is not null when passed to OPENSSL_strcasecmp, which cannot handle
null values.
OpenSSL already ensures this in their implementation of
EVP_get_cipherbyname by using OBJ_NAME_get, so this improves parity.
Change-Id: Icea45a5da2a7a461d2a65fbfbc84653c4f124dab
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/47844
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Rather than computing kVarianceBlocks, which is hard to reason about,
use a sha1_final_with_secret_suffix abstraction. This lets us separate
reasoning in bytes about the minimum and maximum values of |data_size|
and the interaction with HMAC, separately from the core constant-time
SHA-1 update.
It's also faster. I'm guessing it's the more accurate block counts.
Before:
Did 866000 AES-128-CBC-SHA1 (16 bytes) open operations in 2000697us (6.9 MB/sec)
Did 616000 AES-128-CBC-SHA1 (256 bytes) open operations in 2001403us (78.8 MB/sec)
Did 432000 AES-128-CBC-SHA1 (1350 bytes) open operations in 2003898us (291.0 MB/sec)
Did 148000 AES-128-CBC-SHA1 (8192 bytes) open operations in 2006042us (604.4 MB/sec)
Did 83000 AES-128-CBC-SHA1 (16384 bytes) open operations in 2010885us (676.3 MB/sec)
After:
Did 2089000 AES-128-CBC-SHA1 (16 bytes) open operations in 2000049us (16.7 MB/sec) [+141.3%]
Did 851000 AES-128-CBC-SHA1 (256 bytes) open operations in 2000034us (108.9 MB/sec) [+38.2%]
Did 553000 AES-128-CBC-SHA1 (1350 bytes) open operations in 2002169us (372.9 MB/sec) [+28.1%]
Did 178000 AES-128-CBC-SHA1 (8192 bytes) open operations in 2008596us (726.0 MB/sec) [+20.1%]
Did 98000 AES-128-CBC-SHA1 (16384 bytes) open operations in 2001509us (802.2 MB/sec) [+18.6%]
Confirmed with valgrind tooling that this is still constant-time. In
doing so, I ran into a new nuisance with GCC. In loops where we run
constant_time_lt with a counter value, GCC sometimes offsets the loop
counter by the secret. It cancels it out before dereferencing memory,
etc., but valgrind does not know that x + uninit - uninit = x and gets
upset. I've worked around this with a barrier for now.
Change-Id: Ieff8d2cad1b56c07999002e67ce4e6d6aa59e0d3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46686
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This removes the now unnecessary virtual calls. Benchmark differences are
mostly positive but probably noise.
Before:
Did 839000 AES-128-CBC-SHA1 (16 bytes) open operations in 2000497us (6.7 MB/sec)
Did 623000 AES-128-CBC-SHA1 (256 bytes) open operations in 2000409us (79.7 MB/sec)
Did 434000 AES-128-CBC-SHA1 (1350 bytes) open operations in 2002909us (292.5 MB/sec)
Did 146000 AES-128-CBC-SHA1 (8192 bytes) open operations in 2000785us (597.8 MB/sec)
Did 82000 AES-128-CBC-SHA1 (16384 bytes) open operations in 2014268us (667.0 MB/sec)
After:
Did 866000 AES-128-CBC-SHA1 (16 bytes) open operations in 2000697us (6.9 MB/sec) [+3.2%]
Did 616000 AES-128-CBC-SHA1 (256 bytes) open operations in 2001403us (78.8 MB/sec) [-1.2%]
Did 432000 AES-128-CBC-SHA1 (1350 bytes) open operations in 2003898us (291.0 MB/sec) [-0.5%]
Did 148000 AES-128-CBC-SHA1 (8192 bytes) open operations in 2006042us (604.4 MB/sec) [+1.1%]
Did 83000 AES-128-CBC-SHA1 (16384 bytes) open operations in 2010885us (676.3 MB/sec) [+1.4%]
Change-Id: I735e99296ca9a1771518c622b8e7e6979a0d30bc
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46685
Reviewed-by: Adam Langley <agl@google.com>
See also 8129ac6ac4c0ca3a488c225cde580ede7dabe874 and
81198bf323ea9deda907714170d329ca7d2ff01f from upstream.
In trying to figure out why ASan (which normally catches overlapping
memcpys) didn't flag this, I noticed that we actually don't have tests
for empty inputs. I've added them to cipher_tests.txt where missing and
fixed a bad assert in ofb.c.
ASan still doesn't flag this because LLVM even requires memcpy handle
dst == src. Still, fixing it is less effort than getting a clear answer
from GCC and MSVC. Though this puts us in the frustrating position of
trying to follow a C rule that our main toolchain and sanitizer disavow.
https://bugs.llvm.org/show_bug.cgi?id=11763https://reviews.llvm.org/D86993
Change-Id: I53c64a84834ddf5cddca0b3d53a29998f666ea2f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46285
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
All our EVP_CIPHERs are deterministic, so there's usually no point in
testing only one direction. Some of the ECB tests were missing free
decryption tests. CTR is the same in both directions, but we ought to
test the API agrees. OFB vectors are doubled up, so we can merge them
together. Plus there are typos in the OFB-AES192.Decrypt tests, also
present upstream, so we weren't actually testing everything we should.
(I haven't removed the direction-specific logic altogether since the
tests imported from nist_cavp rely on it. Though there may be something
to be said for running them both ways since they don't actually double
them up...)
Change-Id: I36a77d342afa436e89ad244a87567e1a4c6ee9dc
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46284
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Current:
Did 2916000 ChaCha20-Poly1305 (16 bytes) seal operations in 1015000us (2872906.4 ops/sec): 46.0 MB/s
Did 1604750 ChaCha20-Poly1305 (256 bytes) seal operations in 1016000us (1579478.3 ops/sec): 404.3 MB/s
Did 516750 ChaCha20-Poly1305 (1350 bytes) seal operations in 1015000us (509113.3 ops/sec): 687.3 MB/s
Did 99750 ChaCha20-Poly1305 (8192 bytes) seal operations in 1016000us (98179.1 ops/sec): 804.3 MB/s
Did 50500 ChaCha20-Poly1305 (16384 bytes) seal operations in 1016000us (49704.7 ops/sec): 814.4 MB/s
With fix:
Did 6366750 ChaCha20-Poly1305 (16 bytes) seal operations in 1016000us (6266486.2 ops/sec): 100.3 MB/s
Did 3938000 ChaCha20-Poly1305 (256 bytes) seal operations in 1016000us (3875984.3 ops/sec): 992.3 MB/s
Did 1207750 ChaCha20-Poly1305 (1350 bytes) seal operations in 1015000us (1189901.5 ops/sec): 1606.4 MB/s
Did 258500 ChaCha20-Poly1305 (8192 bytes) seal operations in 1016000us (254429.1 ops/sec): 2084.3 MB/s
Did 131500 ChaCha20-Poly1305 (16384 bytes) seal operations in 1016000us (129429.1 ops/sec): 2120.6 MB/s
Change-Id: Iec6417b9855b9d3d1d5154c93a370f80f219c65f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44347
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
For FIPS reasons, one might wish to ensure that a random AES-GCM nonce
was generated entirely within the FIPS module. If so, then these are the
AEADs for you.
Change-Id: Ic2b7864b089f446401f700d7d55bfa6336c61e23
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/43686
Commit-Queue: Adam Langley <alangley@gmail.com>
Reviewed-by: David Benjamin <davidben@google.com>
These booleans are a little hard to understand in context and adding
any more makes things even more complicated. Thus make them flags so
that the meaning is articulated locally.
Change-Id: I8cdb7fd5657bb12f28a73d7c6818d400c987ad3b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/43684
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>