BoringSSL can currently be built in C11 or pre-C11 mode in MSVC. They're
broadly the same, but do use completely different implementations of
alignas and alignof. Now that every build configuration I'm aware of has
been moved to the C11 mode, we don't even test the pre-C11 mode anymore.
Start requiring it.
Update-Note: If building with MSVC, BoringSSL now requires building with
/std:c11 or later. (On non-MSVC compilers, we have required C11 for a
while now.)
Fixed: 624
Change-Id: Ie9f66eee0bebac8143c23a7229c6854afaefea6e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/63065
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
https://boringssl-review.googlesource.com/c/boringssl/+/62585 made the
compiler emit multiple CRYPTO_library_init calls in functions which
dispatch between a tower of alternatives. Ideally, the compiler would
know that at most one call suffices.
There doesn't seem to be such an attribute, but we can get the same
effect with pure or const attributes. We tie init with returning the
capability vector. On Intel, because the vector is so large, we have to
go with a weaker version. Somewhat annoyingly, the getter must be
out-of-line, because otherwise the compiler inlines first and loses the
attribute.
I went with pure because we allow our unit tests to mutate
OPENSSL_armcap_P, which means the Arm one is, strictly speaking, pure,
not const. This slightly reduces optimization potential, but should
still allow deduping in most places. Confirmed that aes_init_key
now only calls a helper function once.
See discussion in
https://boringssl-review.googlesource.com/c/boringssl/+/62585/comment/26083b88_b3db2b75/
Bug: 35
Change-Id: I9bc464f0e5a0ed9601017a5037028f906693a137
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/62985
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
All the C accesses have been sufficiently abstracted that this is pretty
easy to handle automatically.
We still have accesses from assembly, so we're not quite
initializationless yet. But this does get us most of the way there. I'm
thinking what's next is:
- Make a list of asm symbols that touch armcap or ia32cap
- For each, figure out the place(s) in the calling code where we need to
init manually and/or pull the dispatch up into C
One interesting subtlety with how this CL does it: although this CL
means you can freely call, say, CRYPTO_is_SSSE3_capable without
CRYPTO_library_init, you cannot *quite* assume that CRYPTO_library_init
has been called after you call CRYPTO_is_SSSE3_capable. It is possible
that the build defined __SSSE3__, in which case CRYPTO_is_SSSE3_capable
does nothing. This does complicate resolving the asm cases above.
Bug: 35
Change-Id: Ie52c74e4a59a7019c3af0526dbb35950604ada66
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/62585
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
I think this dates to when CRYPTO_is_*_capable were inline functions in
public headers, so they couldn't access OPENSSL_armcap_P directly. Now
they can.
Change-Id: Ic06fffa7f5056401118b62d690dfe6b21bc30f86
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/62345
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
cpu_arm_openbsd.c is the same as OPENSSL_STATIC_ARMCAP.
cpu_aarch64_freebsd.c is the same as cpu_aarch64_sysreg.c. (The FreeBSD
one was using the macros in their headers, but those macros expand to
the same inline assembly.)
Also send ANDROID_BAREMETAL + 32-bit Arm to OPENSSL_STATIC_ARMCAP. This
way we can remove OPENSSL_STATIC_ARMCAP from the Android baremetal build
without having to chase down constraining it to aarch64. See
b/291106677#comment6
Update-Note: This is a slight change to the OpenBSD build. Previously,
we assumed OpenBSD on 32-bit Arm implies NEON. Now, we pick it up from
the __ARM_NEON define from ACLE, i.e. whether the compiler has been told
that NEON is available. (This comes from -march or other options.) Doing
that is desirable anyway: if NEON is in your baseline, you should tell
your compiler so it can vectorize loops.
Change-Id: Icd43a2b56bb6e3f04f0fed996ae750fba65e3312
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/62066
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
nanolibc is an embedded platform with no threads. To start unforking
that build, generalize some of the OPENSSL_TRUSTY defines. OpenSSL has
OPENSSL_NO_SOCK if you don't have sockets and OPENSSL_NO_POSIX_IO if you
don't have file descriptors. Those names are fine enough, so I've
borrowed them here too.
There's more to be done here, but this will clear out some of it.
Change-Id: Iaba1fafdebb46ebb8f68b7956535dd0ccaaa832f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60890
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
We no longer need to define CRYPTO_MUTEX in public headers. This
simplifies a pile of things. First, we can now use pthread_rwlock_t
without any fuss, rather than trying to guess the size on glibc.
As a result, CRYPTO_MUTEX and CRYPTO_STATIC_MUTEX can be merged into one
type. We can almost do this to CRYPTO_refcount_t too. BIO is the one
straggler remaining.
Fixed: 325
Change-Id: Ie93c9f553c0f02ce594b959c041b00fc15ba51d2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60611
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Did 59000 Ed25519 key generation operations in 1004188us (58753.9 ops/sec) [+8.3%]
Did 57000 Ed25519 signing operations in 1005649us (56679.8 ops/sec) [+7.9%]
Did 19000 Ed25519 verify operations in 1054380us (18020.1 ops/sec) [-2.0%]
Did 61000 Curve25519 base-point multiplication operations in 1007401us (60551.9 ops/sec) [+8.3%]
Did 22000 Curve25519 arbitrary point multiplication operations in 1022882us (21507.9 ops/sec) [+0.5%]
Change-Id: I14668f658b1ae99850cb0f8938f90f988d0edd0b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60107
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Every time we free a type with ex_data (RSA, EC_KEY, DSA, SSL_CTX, SSL,
SSL_SESSION, X509, X509_STORE), we allocate and take a read lock. The
allocation means, if we believe in malloc failures, it is possible to
leak memory on malloc failure. The read lock causes an unnecessary bit
of contention writing to the cache line.
Instead, since we never remove ex_data entries, just thread them in a
singly-linked list. This way we only need to synchronize when to stop
iterating. Add a counter to synchronize that. (Or we could make each
'next' pointers atomic, but this seemed more straightforward.)
(I suspect this doesn't matter much, but it was shorter and we were
already allocating the funcs structures anyway.)
Bug: 570
Change-Id: Ie7ba5cc44f2b71ebd79c8971e784912d53af7f5c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60025
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
OPENSSL_C11_ATOMIC is both computed in crypto/internal.h and also
defined externally. This is a remnant of C11 atomics were an opt-in
feature.
If defined externally, this means OPENSSL_C11_ATOMIC might be defined
when built as C++. That, in turn, causes <stdatomic.h> to be included in
C++ mode. At least one of our users toolchains has a <stdatomic.h> that
is incompatible with C++. We don't get anything out of including it, so
just gate the include on !defined(__cplusplus) for now.
Things to look into as follow-up:
- Fix build files to stop defining OPENSSL_C11_ATOMIC. Prior to
https://boringssl-review.googlesource.com/c/boringssl/+/59847, it was
still serving a purpose: in server builds, if autodetection fails, we
would rather fail to build than accidentally fallback to locks.
There is no lock fallback anymore.
- Fix that toolchain so their <stdatomic.h> is C++-compatible. It's
certainly not C++23-conformant. I suspect it's also not
C++11-conformant, but I'm not positive.
Change-Id: I13bcd8380efeb87b9f9cc439fe24a743e48aec60
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59985
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
We don't take write locks in the PRNG, steady state, but we do take some
read locks: computing fork generation, reading the fork-unsafe buffering
flag, and a FIPS-only artifact of some global state clearing mess. That
last one is completely useless, but it's a consequence of FIPS's
understanding of process exit being comically inconsistent with reality.
Taking read locks is, in principle, parallel, but the cacheline write
causes some contention, even in newer glibcs with faster read locks. Fix
these:
- Use atomic reads to check the fork generation. We only need to lock
when we observe a fork.
- Replace the fork-unsafe buffering flag with an atomic altogether.
- Split state_clear_all_lock into a per-rand_thread_state lock. We still
need a read lock, but a completely uncontended one until process exit.
With many threads, this gives a significant perf boost.
x86_64, non-FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 45131875 RNG (16 bytes) operations in 300039649us (150419.7 ops/sec): 2.4 MB/s
Did 44089000 RNG (32 bytes) operations in 300053237us (146937.3 ops/sec): 4.7 MB/s
Did 43328000 RNG (256 bytes) operations in 300058423us (144398.5 ops/sec): 37.0 MB/s
Did 45857000 RNG (1350 bytes) operations in 300095943us (152807.8 ops/sec): 206.3 MB/s
Did 43249000 RNG (8192 bytes) operations in 300102698us (144114.0 ops/sec): 1180.6 MB/s
After:
Did 296204000 RNG (16 bytes) operations in 300009524us (987315.3 ops/sec): 15.8 MB/s
Did 311347000 RNG (32 bytes) operations in 300014396us (1037773.5 ops/sec): 33.2 MB/s
Did 295104000 RNG (256 bytes) operations in 300012657us (983638.5 ops/sec): 251.8 MB/s
Did 255721000 RNG (1350 bytes) operations in 300016481us (852356.5 ops/sec): 1150.7 MB/s
Did 103339000 RNG (8192 bytes) operations in 300040059us (344417.3 ops/sec): 2821.5 MB/s
(Smaller PRNG draws are more impacted because they spend less time in the
DRBG. But they're also more likely because you rarely need to pull 8K of
data out at once.)
x86_64, FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 29060000 RNG (16 bytes) operations in 300081190us (96840.5 ops/sec): 1.5 MB/s
Did 31882000 RNG (32 bytes) operations in 300118031us (106231.5 ops/sec): 3.4 MB/s
Did 30925000 RNG (256 bytes) operations in 300113646us (103044.3 ops/sec): 26.4 MB/s
Did 31969000 RNG (1350 bytes) operations in 300096688us (106529.0 ops/sec): 143.8 MB/s
Did 33434000 RNG (8192 bytes) operations in 300093240us (111412.0 ops/sec): 912.7 MB/s
After:
Did 299013000 RNG (16 bytes) operations in 300012167us (996669.6 ops/sec): 15.9 MB/s
Did 289788000 RNG (32 bytes) operations in 300014611us (965913.0 ops/sec): 30.9 MB/s
Did 298699000 RNG (256 bytes) operations in 300013443us (995618.7 ops/sec): 254.9 MB/s
Did 247061000 RNG (1350 bytes) operations in 300018215us (823486.7 ops/sec): 1111.7 MB/s
Did 100479000 RNG (8192 bytes) operations in 300037708us (334887.9 ops/sec): 2743.4 MB/s
On an M1 Pro, it's mostly a wash by default (fewer threads because this chip has fewer cores)
aarch64, M1 Pro, 8 threads:
Before:
Did 23218000 RNG (16 bytes) operations in 80009131us (290191.9 ops/sec): 4.6 MB/s
Did 23021000 RNG (256 bytes) operations in 80007544us (287735.4 ops/sec): 73.7 MB/s
Did 22853000 RNG (1350 bytes) operations in 80013184us (285615.4 ops/sec): 385.6 MB/s
Did 25407000 RNG (8192 bytes) operations in 80008371us (317554.3 ops/sec): 2601.4 MB/s
Did 22128000 RNG (16384 bytes) operations in 80013269us (276554.1 ops/sec): 4531.1 MB/s
After:
Did 23303000 RNG (16 bytes) operations in 80011433us (291245.9 ops/sec): 4.7 MB/s
Did 23072000 RNG (256 bytes) operations in 80008755us (288368.4 ops/sec): 73.8 MB/s
Did 22807000 RNG (1350 bytes) operations in 80013355us (285039.9 ops/sec): 384.8 MB/s
Did 23759000 RNG (8192 bytes) operations in 80010212us (296949.6 ops/sec): 2432.6 MB/s
Did 23193000 RNG (16384 bytes) operations in 80011537us (289870.7 ops/sec): 4749.2 MB/s
This is likely because, without RDRAND or MADV_WIPEONFORK, we draw from
the OS on every call. We're likely bottlenecked by getentropy, whether
it's some internal synchronization or syscall overherad. With
fork-unsafe buffering enabled, this change shows even more significant
wins on the M1 Pro.
aarch64, fork-unsafe buffering, M1 Pro, 8 threads:
Before:
Did 25727000 RNG (16 bytes) operations in 80010579us (321545.0 ops/sec): 5.1 MB/s
Did 25776000 RNG (32 bytes) operations in 80008587us (322165.4 ops/sec): 10.3 MB/s
Did 25780000 RNG (256 bytes) operations in 80006127us (322225.3 ops/sec): 82.5 MB/s
Did 33171250 RNG (1350 bytes) operations in 80002532us (414627.5 ops/sec): 559.7 MB/s
Did 54784000 RNG (8192 bytes) operations in 80005706us (684751.2 ops/sec): 5609.5 MB/s
After:
Did 573826000 RNG (16 bytes) operations in 80000668us (7172765.1 ops/sec): 114.8 MB/s
Did 571329000 RNG (32 bytes) operations in 80000423us (7141574.7 ops/sec): 228.5 MB/s
Did 435043750 RNG (256 bytes) operations in 80000214us (5438032.3 ops/sec): 1392.1 MB/s
Did 229536000 RNG (1350 bytes) operations in 80001888us (2869132.3 ops/sec): 3873.3 MB/s
Did 57253000 RNG (8192 bytes) operations in 80004974us (715618.0 ops/sec): 5862.3 MB/s
Note that, on hardware with RDRAND, the read lock in
rand_fork_unsafe_buffering_enabled() doesn't do much. But without
RDRAND, we hit that on every RAND_bytes call. More importantly, the
subsequent CL will fix a bug that will require us to hit it more
frequently.
I've removed the volatile on g_fork_detect_addr because I think we
didn't need it and this avoids thinking about the interaction between
volatile and atomics. The pointer is passed into madvise, so the
compiler knows the pointer escapes. For it to be invalid, the compiler
would need to go out of its way to model madvise as not remembering the
pointer, which would be incorrect of it for MADV_WIPEONFORK.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ie6977acd1b8e7639aaa419cf6f4f5f0645bde9d1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59849
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
refcount.c is now a single, generic file that calls into C11-atomic-like
APIs. Behind the scenes, this selects one of C11 atomics, Windows
interlocked APIs, or unsynchronized reads/writes (in the no-threads
build).
This frees us up to use atomics elsewhere in the library. For now, this
only binds sequentially consistent atomics, but we can add other memory
orders if needed. In particular, I believe up_ref only needs relaxed
atomics. Some of the later change I think only need acquire and release,
but I'm not positive.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ifcd7357611bb7a8cd14b82c23ad080d1a2df1386
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59848
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
On Windows, we can rely on Interlocked APIs. On non-Windows builds, we
currently require C11 but permit C11 atomics to be missing, via
__STDC_NO_ATOMICS__. This CL tightens this so C11 atomics are required
on non-MSVC builds.
My hope is that, now that we require C11 on non-Windows, this is a
fairly safe requirement. We already require pthreads on any platform
where this might apply, and it's hard to imagine someone has C11,
pthreads, but not C11 atomics.
This change means that, in later work, we can refactor the refcount
logic to instead be a <stdatomic.h> compatibility layer, and then an
atomics-targetting CRYPTO_refcount_t implementation. With a
<stdatomic.h> compatibility layer, we can use atomics in more places,
notably where our uses of read locks are causing cacheline contention.
The platform restriction isn't *strictly* necessary. We could, like with
refcounts, emulate <stdatomic.h> with a single, global lock. Indeed any
platforms in this situation have already been living with that lock for
refcounts without noticing. But then later work to add "atomics" to read
locks would regress contention for those platforms. So I'm starting by
rejecting this, so if any such platform exists, we can understand their
performance needs before doing that.
Update-Note: On non-Windows platforms, we now require C11 atomics
support. Note we already require C11 itself. If this affects your build,
get in touch with BoringSSL maintainers.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I868fa4ba87ed73dfc9d52e80d46853ef56715a5f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59847
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Right now, MSVC has to fallback to refcount_lock.c, which uses a single,
global lock for all refcount operations. Instead, use the Interlocked*
APIs to implement them.
The motivation is two-fold. First, this removes a performance cliff when
building for Windows on a non-Clang compiler. (Although I've not been
able to measure it in an end-to-end EVP benchmark, only a synthetic
refcount-only benchmark.)
More importantly, it gets us closer to assuming atomics support on all
non-NO_THREADS configurations. (The next CL will clear through that.)
That, in turn, will make it easier to add an atomics-like abstractions
to some of our hotter synchronization points. (Even in newer glibc, with
its better rwlock, read locks fundamentally need to write to memory, so
we have some cacheline contention on shared locks.)
Annoyingly, the Windows atomic_load replacement is not quite right. I've
used a "no-op" InterlockedCompareExchange(p, 0, 0) which, empirically,
still results in a write. But a write to the refcount cacheline is
surely better than taking a global exclusive lock. See comments in file
for details. OpenSSL uses InterlockedOr(p, 0), but that actually results
in even worse code. (InterlockedOr needs a retry loop when the
underlying cmpxchg fails, whereas InterlockedCompareExchange is a single
cmpxchg.)
Hopefully, in the future (perhaps when we require VS 2022's successor,
based on [1]), this can be removed in favor of C11 atomics everywhere.
[1] https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I125da139e2fd3ae51e54309309fda16ba97ccf20
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59846
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Remove all the other ERR_R_MALLOC_FAILURES from the
codebase.
Also changes cbb to push to the error stack, to correctly
report cbb failures instead of now only reporting
malloc failures. Previously it turned all cbb failures
into a malloc failure
Bug: 564
Change-Id: Ic13208bf9d9aaa470e83b2f15782fc94946bbc7b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57046
Auto-Submit: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This includes an internal version which allows a flag to specify
the use of system malloc, or OPENSSL_malloc - this in turn allows
us to use this function in the ERR family of functions and allow
for ERR to not call OPENSSL_malloc with a circular dependency.
Bug: 564
Change-Id: Ifd02d062fda9695cddbb0dbef2e1c1db0802a486
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57005
Auto-Submit: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Malloc failure testing is quadratic in the number of allocations. To
test a failure at allocation N, we must first run the previous N-1
allocations. Now that we have combined GTest binaries, this does not
work very well.
Use the test listener to reset the counter across independent tests. We
assume failures in a previous test won't interfere in the next one and
run each test's counter in parallel.
The assumption isn't *quite* true because we have a lot of internal
init-once machinery that is reused across otherwise "independent" tests,
but it's close enough that I was able to find some bugs, fixed in the
next commit. That said, the tests still take too long to run to
completion.
Bug: 127
Change-Id: I6836793448fbdc740a8cc424361e6b3dd66fb8a6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56926
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We check OAEP padding in constant time, but once the padding is
determined to be valid (or not), this fact and, if valid, the output
length are public.
Change-Id: I2aa6a707ca9a91761776746264416736c820977c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56845
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Newer versions of clang figure out that copy_from_prebuf (used in builds
that aren't x86_64 with assembly optimizations) has a bunch of no-op
iterations and insert a branch. Add a value barrier to stop it. This was
caught by our valgrind-based constant-time validation.
As part of this, I noticed that OPENSSL_NO_ASM builds turn off value
barriers. This is because the value barriers use an empty inline asm
block. While this is technically correct, it's probably unnecessary.
The clang|gcc check means we know GCC-style inline assembly is
supported. Disabling inline asm is used by sanitizers to shut off
unintrumentable code, but there's no uninstrumentable code in the empty
string. It's also used by consumers who haven't figured out how to
integrate an assembler into their build system, but that also doesn't
apply. So just remove the condition on the value barriers so
OPENSSL_NO_ASM also get mitigations.
Update-Note: It is possible the above is wrong and some OPENSSL_NO_ASM
relied on value barriers being disabled. If so, this will break that
build and we'll need to reconsider.
Change-Id: I6e3ea3ee705bef3afcf42d3532b17aaabbbcc60b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56827
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This silences a few false positives in the valgrind-based constant-time
validation.
First, there are a few precondition checks that are publicly true, but
valgrind doesn't know that. I've added a constant_time_declassify_int
function and stuck those in there, since the existing macro is mostly
suited for macros. It also adds a value barrier in production code (see
comment for why). If we more thoroughly decoupled RSA from BIGNUM, we
could probably avoid this, since a lot of comes from going through
public BIGNUM APIs.
Next, our BIGNUM strategy is such that bounds on bignums are sometimes
computed pessimally, and then clamped down later. Modular arithmetic is
trivially bounded and avoids that, but RSA CRT involves some non-modular
computations. As a result, we actually compute a few more words than
necessary in the RSA result, and then bn_resize_words down.
bn_resize_words also has a precondition check, which checks that all
discarded words are zero. They are, but valgrind does not know that.
Similarly, the BN_bn2bin_padded call at the end checks for discarded
non-zero bytes, but valgrind does not know that, because the output is
bounded by n, the discarded bytes are zero.
I've added a bn_assert_fits_in_bytes to clear this. It's an assert in
debug mode and a declassification in constant-time validation.
I suspect a different secret integer design would avoid needing this. I
think this comes from a combination of non-modular arithmetic, not
having callers pass explicit width, and tracking public widths at the
word granularity, rather than byte or bit. (Bit would actually be most
ideal.) Maybe worth a ponder sometime.
Change-Id: I1bc9443d571d2881e2d857c70be913074deac156
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56825
Commit-Queue: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
We no longer have a need to support ppc64le, nor do we have any testing
story for the assembly we previously had. Remove all ppc64le-specific
assembly.
This CL stops short of removing it from base.h. That'll be done in a
follow-up CL, just to separate which removals are for the assembly and
which removals remove all support.
Update-Note: After this change, ppc64le builds drop assembly
optimizations and will fallback to a generic C-based AES implementation.
Change-Id: Ic8075638085761d66cebc276eb16c4770ce03920
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56388
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Starting version 2019 16.8 (released November 2020), MSVC actually
implements parts of C11, though disabled by default. You have to pass
/std:c11, and then alignas, alignof, and noreturn all work.
When built that way, better to use the real ones, so check for
__STDC_VERSION__ first. It would be nice to mandate that so we can
remove the polyfill, but for now just opportunistically use it when we
can.
Sadly, even in this mode, they still don't implement C11 atomics, so the
refcounting implementation will still be slow.
Change-Id: I28dab4a339c368f7d8f8da5aa7aee1cb344803d3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53006
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
I don't think these are all UB by C's rules, but it's easier not to
think about the pointers. Still more to go, but these were some easy
ones.
Bug: 301
Change-Id: Icdcb7fb40f85983cbf566786c5f7dbfd7bb06571
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52905
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
This is cribbed, with perimssion, from AWS-LC. The FIPS service
indicator[1] signals when an approved service has been completed.
[1] FIPS 140-3 IG 2.4.C
Change-Id: Ib40210d69b3823f4d2a500b23a1606f8d6942f81
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52568
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience
for converting to and from the byte representation. However,
type-punning with unions is not allowed in C++ and hard to use correctly
in C. As I understand the rules, they are:
- The abstract machine knows what member of union was last written to.
- In C, reading from an inactive member is defined to type-pun. In C++,
it is UB though some compilers promise the C behavior anyway.
- However, if you read or write from a *pointer* to a union member, the
strict aliasing rule applies. (A function passed two pointers of
different types otherwise needs to pessimally assume they came from
the same union.)
That last rule means the type-punning allowance doesn't apply if you
take a pointer to an inactive member, and it's common to abstract
otherwise direct accesses of members via pointers.
https://github.com/openssl/openssl/issues/18225 is an example where
similar union tricks have caused problems for OpenSSL. While we don't
have that code, EC_SCALAR and EC_FELEM play similar tricks.
We do get a second lifeline because our alternate view is a uint8_t,
which we require to be unsigned char. Strict aliasing always allows the
pointer type to be a character type, so pointer-indirected accesses of
EC_SCALAR.bytes aren't necessarily UB. But if we ever write to
EC_SCALAR.bytes directly (and we do), we'll switch the active arm and
then pointers to EC_SCALAR.words become strict aliasing violations!
This is all far too complicated to deal with. Ideally everyone would
build with -fno-strict-aliasing because no real C code actually follows
these rules. But we don't always control our downstream consumers'
CFLAGS, so let's just avoid the union. This also avoids a pitfall if we
ever move libcrypto to C++.
For p224-64.c, I just converted the representations directly, which
avoids worrying about the top 32 bits in p224_felem_to_generic. Most of
the rest was words vs. bytes conversions and boils down to a cast (we're
still dealing with a character type, at the end of the day). But I took
the opportunity to extract some more "words"-based helper functions out
of BIGNUM, so the casts would only be in one place. That too saves us
from the top bits problem in the bytes-to-words direction.
Bug: 301
Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
MSVC is a little behind, but otherwise we should be able to assume C11
support in all our compilers. The only C99 builds should just be stale
build files. Such consumers are leaving performance on the table, by
using the worse refcounting implementation.
For now, don't require it in public headers. Android's build is still
defaulting to C99, which means requiring C11 will be disruptive. We can
try the public headers after that's fixed.
Update-Note: If the build fails with an error about C11, remove -std=c99
or -std=gnu99 from your build. Refcounting will get faster.
Change-Id: I2ec6f7d7acc026a451851d0c38f60c14bae6b00f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52247
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
fips_break_test.h is a bad name because generate_build_files.py thinks
that it's a test file, which it is, but one that's needed in the main
build. Thanks to Svilen Kanev for noting this.
That header doesn't particularly carry its weight. The idea was that
rebuilding the break test wouldn't need to rebuild everything if that
logic was isolated in its own header. But we only have to rebuild once
now, so whatever. There's already a block of crypto/internal.h with very
similar stuff; it can go there.
Change-Id: Ifb479eafd4df9a7aac4804cae06ba87257c77fc3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51485
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
AS10.20 requires that the self-test for the integrity algorithm pass
before the integrity check itself. IG 10.3.A requires an HMAC self-test
now. Therefore run these tests before the integrity check.
Since we also need the ability to run all self-tests, both SHA
self-tests and the HMAC test are run again when running self-tests.
I'm assuming that they're so fast that it doesn't matter.
Change-Id: I6b23b6fd3cb6107edd7420bc8680780719bd41d2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51328
Reviewed-by: David Benjamin <davidben@google.com>
The provision of FIPS that allowed the tests to be skipped based on a
flag-file has been removed in 140-3. Therefore we expect to run the fast
KATs on start-up, but to defer to slower ones until the functionality in
question is first used. So this change splits off the fast KATs and
removes support for skipping KATs based on a flag-file.
Change-Id: Ib24cb1739cfef93e4a1349d786a0257ee1083cfb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51326
Reviewed-by: David Benjamin <davidben@google.com>
The latest version of ACLE splits __ARM_FEATURE_CRYPTO into two defines
to reflect that, starting ARMv8.2, the cryptography extension can
include {AES,PMULL} and {SHA1,SHA256} separately.
Also standardize on __ARM_NEON, which is the recommended symbol from
ACLE, and the only one defined on non-Apple aarch64 targets. Digging
through GCC history, __ARM_NEON__ is a bit older. __ARM_NEON was added
in GCC's 9e94a7fc5ab770928b9e6a2b74e292d35b4c94da from 2012, part of GCC
4.8.0.
I suspect we can stop paying attention to __ARM_NEON__ at this point,
but I've left both working for now. __ARM_FEATURE_{AES,SHA2} is definite
too new to fully replace __ARM_FEATURE_CRYPTO.
Tested on Linux that -march=armv8-a+aes now also drops the fallback AES
code. Previously, we would pick up -march=armv8-a+crypto, but not
-march=armv8-a+aes. Also tested that, on an OPENSSL_STATIC_ARMCAP build,
-march=armv8-a+sha2 sets the SHA-1 and SHA-256 features.
Change-Id: I749bdbc501ba2da23177ddb823547efcd77e5c98
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50847
Reviewed-by: Adam Langley <agl@google.com>
These symbols were not marked OPENSSL_EXPORT, so they weren't really
usable externally anyway. They're also very sensitive to various build
configuration toggles, which don't always get reflected into projects
that include our headers. Move them to crypto/internal.h.
Change-Id: I79a1fcf0b24e398d75a9cc6473bae28ec85cb835
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50846
Reviewed-by: Adam Langley <agl@google.com>
We have a ton of per-file rotation functions, often with generic names
that do not tell you whether they are uint32_t vs uint64_t, or rotl vs
rotr.
Additionally, (x >> r) | (x << (32 - r)) is UB at r = 0.
(x >> r) | (x << ((-r) & 31)) works for 0 <= r < 32, which is what
cast.c does. GCC and Clang recognize this pattern as a rotate, but MSVC
doesn't. MSVC does, however, provide functions for this.
We usually rotate by a non-zero constant, which makes this moot, but
rotation comes up often enough that it's worth extracting out. Some
particular changes to call out:
- I've switched sha256.c from rotl to rotr. There was a comment
explaining why it differed from the specification. Now that we have
both functions, it's simpler to just match the specification.
- I've dropped all the inline assembly from sha512.c. Compilers should
be able to recognize rotations in 2021.
Change-Id: Ia1030e8bfe94dad92514ed1c28777447c48b82f9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49765
Reviewed-by: Adam Langley <agl@google.com>
In configurations without threads, we're not thread-safe anyway. Instead
use the refcount_lock.c implementation which, in turn, calls into
thread_none.c, so this turns into a plain refcount.
This avoids a build issue on platforms which define NO_THREADS, use C11,
lack C11 atomics, and are missing a __STDC_NO_ATOMICS__ definition. The
platforms ought to define __STDC_NO_ATOMICS__ or implement them, but
atomics are also unnecessary overheard in NO_THREADS configurations
anyway.
Change-Id: I927e1825dd6474d95226b93dad704594f120450a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48565
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Also use a slightly more conservative pattern. Instead of aligning the
pointer as a uintptr_t and casting back, compute the offset and advance
in pointer space. C guarantees that casting from pointer to uintptr_t
and back gives the same pointer, but general integer-to-pointer
conversions are generally implementation-defined. GCC does define it in
the useful way, but this makes fewer dependencies.
Change-Id: I70c7af735e892fe7a8333b78b39d7b1f3f1cdbef
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48405
Reviewed-by: Adam Langley <alangley@gmail.com>
We have loads of variations of these. Align them in one set. This avoids
the HOST_* macros defined by md32_common.h, so it'll be a little easier
to make it a more conventional header.
Change-Id: Id47fe7b51a8f961bd87839f8146d8a5aa8027aa6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46425
Reviewed-by: Adam Langley <agl@google.com>
It's now 2021. Hopefully we can at least assume anyone building with
-std=c11 also has a corresponding set of headers. Plus, even if you
don't, Clang seems to provide a header. (So C11 atomics work in
clang-cl.) Also apparently atomics are optional, so this checks
__STDC_NO_ATOMICS__.
This does *not* set C11 as the minimum version. If you build with
-std=c99, we'll silently use the non-atomics implementation. That's a
little magical, so I've kept OPENSSL_C11_ATOMIC as a way to assert that
you really want C11 atomics. Mostly it turns into a -std=c11 && !MSVC
self-assert.
Update-Note: If something fails to compile, we'll revert this and adjust
the check, or add an opt-out, or give up. Also, if building with
-std=c99, consider -std=c11.
Change-Id: I1a8074c367a765c5a0f087db8c250e050df2dde8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46344
Reviewed-by: Adam Langley <agl@google.com>
In order to provide evidence to auditors that high-level functions end
up calling into the FIPS module, provide counters that allow for such
monitoring.
Change-Id: I55d45299f3050bf58077715ffa280210db156116
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/46124
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>