Did 59000 Ed25519 key generation operations in 1004188us (58753.9 ops/sec) [+8.3%]
Did 57000 Ed25519 signing operations in 1005649us (56679.8 ops/sec) [+7.9%]
Did 19000 Ed25519 verify operations in 1054380us (18020.1 ops/sec) [-2.0%]
Did 61000 Curve25519 base-point multiplication operations in 1007401us (60551.9 ops/sec) [+8.3%]
Did 22000 Curve25519 arbitrary point multiplication operations in 1022882us (21507.9 ops/sec) [+0.5%]
Change-Id: I14668f658b1ae99850cb0f8938f90f988d0edd0b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60107
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
BN_nnmod() can deal with the situation that the first and the second
arguments are the same, but it cannot deal with the first and the
second argument being equal. In that situation, if BN_mod(x, y, x, ctx)
results in a negative x, then the result of BN_nnmod() is zero. This
breaks the strange BN_mod_inverse(m, a, m, ctx).
Reported by Guido Vranken in
https://github.com/openssl/openssl/issues/21110
Change-Id: I8584720660f214f172b3b33716a5e3b29e8f2fd8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60365
Reviewed-by: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Andres Erbsen noticed we didn't actually have tests to catch when the
format macros were wrong.
In doing so, remove BN_DEC_FMT2. It was unused and only makes sense in
the context of the bignum-to-decimal conversion, where we no longer use
it anyway. None of these macros are exported in OpenSSL at all, so it
should be safe to remove it. (We possibly can remove the others too. I
see one use outside the library, and that use would probably be better
written differently anyway.)
Change-Id: I4ffc7f9f7dfb399ac060af3caff0778000010303
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60325
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
NGINX requires this constant if SSL_get_negotiated_group is defined.
OpenSSL returns this to indicate a named group constant it doesn't
understand, which doesn't make sense out of SSL_get_negotiated_group
because the library wouldn't negotiate a group it doesn't know about.
Nonetheless, define it for compatibility.
Fixed: 615
Change-Id: I05a6d607912bb8507be9ac6ff5fe1699b4715f6b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60326
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
All inputs are marked as secret. This is not to support a use case for
calling X25519 with a secret *point* as the input, but rather to ensure
that the choice of the point cannot influence whether the scalar is
leaked or not. Same for the initial contents of the output buffer.
This is a conservative choice and may be revised in the future.
Change-Id: I595d454a8e1fdc409912aee751bb0b3cf46f5430
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60186
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Many of our point addition functions internally check for the doubling
case and branch because the addition formulas are incomplete. This
branch is fine because the multiplication formulas are arranged to not
hit this case. However, we don't want to leak the couple of intermedate
values that determine whether to branch. Previously, we ran into this
with https://boringssl-review.googlesource.com/c/boringssl/+/36465.
This wasn't sufficient. The compiler understands if (a & b) enough to
compile into two branches. Thanks to Moritz Schneider, Nicolas Dutly,
Daniele Lain, Ivan Puddu, and Srdjan Capkun for reporting this!
Fix the leak by adding a value barrier on the final value. As we're also
intentionally leaking the result of otherwise secret data flow, I've
used the constant_time_declassify functions, which feed into our
valgrind-based constant-time validation and double as barriers.
Accordingly, I've also added some CONSTTIME_SECRET markers around the
ECDSA nonce value, so we can check with valgrind the fix worked. The
marker really should be at a lower level, at ec_random_nonzero_scalar or
further (maybe RAND_bytes?), but for now I've just marked the nonce.
To then clear valgrind, add constant_time_declassify in a few other
places, like trying to convert infinity to affine coordinates. (ECDH
deals with secret points, but it is public that the point isn't
infinity.)
Valgrind now says this code is constant-time, at least up to compilation
differences introduced by the annotations. I've also inspected the
compiler output. This seems to be fine, though neither test is quite
satisfying. Ideally we could add annotations in ways that don't
influence compiler output.
Change-Id: Idfc413a75d92514717520404a0f5424903cb4453
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60225
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Right now we use NIDs to configure the group list, but group IDs (the
TLS codepoints) to return the negotiated group. The NIDs come from
OpenSSL, while the group ID was original our API. OpenSSL has since
added SSL_get_negotiated_group, but we don't implement it.
To add Kyber to QUIC, we'll need to add an API for configuring groups to
QUICHE. Carrying over our inconsistency into QUICHE's public API would
be unfortunate, so let's use this as the time to align things.
We could either align with OpenSSL and say NIDs are now the group
representation at the public API, or we could add a parallel group ID
API. (Or we could make a whole new SSL_NAMED_GROUP object to pattern
after SSL_CIPHER, which isn't wrong, but is even more new APIs.)
Aligning with OpenSSL would be fewer APIs, but NIDs aren't a great
representation. The numbers are ad-hoc and even diverge a bit between
OpenSSL and BoringSSL. The TLS codepoints are better to export out to
callers. Also QUICHE has exported the negotiated group using the
codepoints, so the natural solution would be to use codepoints on input
too.
Thus, this CL adds SSL_CTX_set1_group_ids and SSL_set1_group_ids. It
also rearranges the API docs slightly to put the group ID ones first,
and leaves a little note about the NID representation before introducing
those.
While I'm here, I've added SSL_get_negotiated_group. NGINX seems to use
it when available, so we may as well fill in that unnecessary
compatibility hole.
Bug: chromium:1442377
Change-Id: I47ca8ae52c274133f28da9893aed7fc70f942bf8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60208
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This adds "group" versions of our codepoint-based APIs. This is mostly
because it looks weird to switch terminology randomly in the
implementation. See I7a356793d36419fc668364c912ca7b4f5c6c23a2 for
additional context.
I've not bothered renaming the bssl_shim flags. That seems a waste of
time.
Change-Id: I566ad132f5a33d99efd8cb2bb8b76d9a6038c825
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60207
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We're this awkward mix of "group" and "curve" right now. On the spec
side, this is because they used to be "curves", but then RFC 7919
renamed to "group" in an attempt to generalize FFDH and ECDH. The
negotiated FFDH stuff never really went anywhere (the way it used cipher
suite values in TLS 1.2 made it unusable), but the name change stuck.
In our implementation and API, we originally called it "curve". In
preparation for TLS 1.3, we renamed the internals to "group" to match
the spec in
https://boringssl-review.googlesource.com/c/boringssl/+/7955, but the
public API was still "curve".
Then we exported a few more things in
https://boringssl-review.googlesource.com/c/boringssl/+/8565, but I left
it at "curve" to keep the public API self-consistent.
Then we added OpenSSL's new "group" APIs in
https://boringssl-review.googlesource.com/c/boringssl/+/54306, but
didn't go as far to deprecate the old ones yet.
Now I'd like to add new APIs to clear up the weird mix of TLS codepoints
and NIDs that appear in our APIs. But our naming is a mess, so either
choice of "group" or "curve" for the new API looks weird.
In hindsight, we probably should have left it at "curve". Both terms are
equally useless for the future post-quantum KEMs, but at least "curve"
is more unique of a name than "group". But at this point, I think we're
too far through the "group" rename to really backtrack:
- Chromium says "group" in its internals
- QUICHE says "group" in its internals and public API
- Our internals say "group"
- OpenSSL has switched to "group" and deprecated "curve", so new APIs
will be based on "group"
So align with all this and say "group". This CL handles set1_curves and
set1_curves_list APIs, which already have "group" replacements from
OpenSSL. A follow-up CL will handle our APIs. This is a soft deprecation
because I don't think updating things is particularly worth the churn,
but get the old names out of the way, so new code can have a simpler API
to target.
Also rewrite the documentation for that section accordingly. I don't
think we need to talk about how it's always enabled now. That's a
reference to some very, very old OpenSSL behavior where ECDH negotiation
needed to be separately enabled.
Change-Id: I7a356793d36419fc668364c912ca7b4f5c6c23a2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60206
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Update-Note: SSL_CIPHER_get_value was our original name for the
function. OpenSSL later called it SSL_CIPHER_get_protocol_id. I believe
all external callers have since been updated to use the new function.
(If I missed a few stragglers, replace with SSL_CIPHER_get_protocol_id
to fix.)
Change-Id: I956fb49bf2d13a898eed73177493d2c8d50778ad
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60205
Reviewed-by: Bob Beck <bbe@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Change-Id: I7458b1d7aa1736d586dc80660d59c07fa2ac1c8a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59805
Reviewed-by: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Also add some tests for X25519_public_from_private, as we apparently
weren't directly testing it with test vectors.
Change-Id: I1b73a9655323d507a8e022c62530ddd4610db4b9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60109
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CONF supports a variable expansion feature, which allows a config file
to easily grow exponentially.
2d05568a7b (upstream's
6a6213556a80ab0a9eb926a1d6023b8bf44f2afd) capped the expansion to 65536
bytes, but this still allows a small input to produce a fairly large
output. (Once we have one large value, it only takes five characters
make a new key that references it, which is an expansion factor of
around 13,000.) This, combined with the string-based extensions
machinery's many quadratic behaviors (see
b92fcfdc17), means fuzzers hit timeouts.
We have very few uses of these APIs left, and none of them use this
feature. Just remove it. While none of these super-linear behaviors are
DoS bugs per se (they should never be passed untrusted input), there's
no sense in carrying an unused feature that only frustrates the fuzzers.
Update-Note: NCONF_load and NCONF_load_bio no longer support the $foo
variable expansion syntax. If you are using these functions in your
project, remove them.
Bug: oss-fuzz:59049
Change-Id: I85832ae1d05373ee420f4fb68197786736ca8272
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60105
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
DQUOTE and FCOMMENT are remnants of a second parser in OpenSSL. OpenSSL
has CONF_type_default, which is the one we imported, and CONF_type_win32
which is a different syntax which looks to be modeled after Windows ini
files. DQUOTE and FCOMMENT only exist in this one.
Change-Id: Iffa01fcb012b0f3e7f8dbf87a01ba867bdc7bcc7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60087
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This file needs more work, but apply some small, local simplifications.
Change-Id: Ia2b93f847e67ae7738afc791acb9ce3cc6cc0342
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60086
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Change-Id: I8792f118d281bc7a407dfbabe1c8b8e63f9eed9f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60085
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
We now ensure STACK_OF(T) sizes and indices fit in INT_MAX, so it's safe
to cast to int.
Bug: 516
Change-Id: I33dd1de6d60a852d510b9b5c3ac70e2eacbc8905
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60066
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Although we've switched STACK_OF(T) to use size_t, OpenSSL used int
pervasively. Much of crypto/x509 and third-party callers use int
indices. As much of that is in the public API now, ensure that
STACK_OF(T) can never exceed INT_MAX elements.
Bug: 516
Change-Id: I26b8fe590655f8c3e449b749b5d0222e28c413f8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60065
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
The Close() method of the middle often wasn't getting called because
`os.Exit(0)` was used in some places. Once that's fixed, it's clear that
the queue of pending reads needed to be closed before waiting for the
reader goroutine to finish. Lastly, don't bother trying to record the
error that the reader saw: just panic the process if the modulewrapper
dies during processing.
Change-Id: Icf077cefd0ace2ef721a493f99fede6269531257
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60045
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Every time we free a type with ex_data (RSA, EC_KEY, DSA, SSL_CTX, SSL,
SSL_SESSION, X509, X509_STORE), we allocate and take a read lock. The
allocation means, if we believe in malloc failures, it is possible to
leak memory on malloc failure. The read lock causes an unnecessary bit
of contention writing to the cache line.
Instead, since we never remove ex_data entries, just thread them in a
singly-linked list. This way we only need to synchronize when to stop
iterating. Add a counter to synchronize that. (Or we could make each
'next' pointers atomic, but this seemed more straightforward.)
(I suspect this doesn't matter much, but it was shorter and we were
already allocating the funcs structures anyway.)
Bug: 570
Change-Id: Ie7ba5cc44f2b71ebd79c8971e784912d53af7f5c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60025
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
3DES has long been obsolete. It uses a small block size, making it
vulnerable to attacks at sufficiently high volumes (see
https://sweet32.info/, CVE-2016-6329). On top of this, it is slow even
without constant-time protections, making it a DoS risk for server
operators.
Since the alias "3DES" has existed in OpenSSL for a long time, keep that
one working, to reduce the risk of breaking someone who specifically
wanted 3DES enabled.
Update-Note: This CL disables TLS_RSA_WITH_3DES_EDE_CBC_SHA by default.
Specifically, it will not be included unless explicitly listed in the
cipher config, as "TLS_RSA_WITH_3DES_EDE_CBC_SHA", its legacy OpenSSL
name "DES-CBC3-SHA", or the alias "3DES". To restore it, add one of the
above to your cipher config.
Bug: 599
Change-Id: Ib94a2f149b3bfa240ef1008b9f3729a9c10368fb
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59425
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
OPENSSL_C11_ATOMIC is both computed in crypto/internal.h and also
defined externally. This is a remnant of C11 atomics were an opt-in
feature.
If defined externally, this means OPENSSL_C11_ATOMIC might be defined
when built as C++. That, in turn, causes <stdatomic.h> to be included in
C++ mode. At least one of our users toolchains has a <stdatomic.h> that
is incompatible with C++. We don't get anything out of including it, so
just gate the include on !defined(__cplusplus) for now.
Things to look into as follow-up:
- Fix build files to stop defining OPENSSL_C11_ATOMIC. Prior to
https://boringssl-review.googlesource.com/c/boringssl/+/59847, it was
still serving a purpose: in server builds, if autodetection fails, we
would rather fail to build than accidentally fallback to locks.
There is no lock fallback anymore.
- Fix that toolchain so their <stdatomic.h> is C++-compatible. It's
certainly not C++23-conformant. I suspect it's also not
C++11-conformant, but I'm not positive.
Change-Id: I13bcd8380efeb87b9f9cc439fe24a743e48aec60
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59985
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
If a process calls fork(), then the child process never forks again, the
child may wish to call RAND_enable_fork_unsafe_buffering(). However,
doing so exposes a bug: we assume that, if the flag is set, we don't
need to worry about fork-safety. But it is possible that the PRNG state
was cloned from another process which does not work.
Concretely, consider a zygote process, e.g. Chromium's. A zygote process
would retain fork-safety, but pass along its PRNG state to each of its
children. If the children never fork, they might disable fork-safety,
hitting this bug. (Chromium does not call this API. This is just a
hypothetical scenario.)
Fix this by reseeding whenever the fork-safety bit changes. This fix
does not strictly depend on the atomics work, but it causes us to
unconditionally sample rand_fork_unsafe_buffering_enabled(). This no
longer causes contention because it's just another atomic load.
This only affects systems without MADV_WIPEONFORK and without fast
RDRAND. If RDRAND is fast, we're always fork-safe and MADV_WIPEONFORK
allows us to efficiently detect forks.
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I6d0c471c62c951254faf85420a7dc3f4a9d65ee0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59850
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This is intended to be run under TSan. ex_data currently works by taking
a global read lock every time an RSA, SSL, SSL_CTX, etc., is freed. We
should be able to fix that but, first, make sure we have test coverage
for the threading requirements.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I0e12907e116481d88e45191a1f15f3a51833bf6d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59865
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
We don't take write locks in the PRNG, steady state, but we do take some
read locks: computing fork generation, reading the fork-unsafe buffering
flag, and a FIPS-only artifact of some global state clearing mess. That
last one is completely useless, but it's a consequence of FIPS's
understanding of process exit being comically inconsistent with reality.
Taking read locks is, in principle, parallel, but the cacheline write
causes some contention, even in newer glibcs with faster read locks. Fix
these:
- Use atomic reads to check the fork generation. We only need to lock
when we observe a fork.
- Replace the fork-unsafe buffering flag with an atomic altogether.
- Split state_clear_all_lock into a per-rand_thread_state lock. We still
need a read lock, but a completely uncontended one until process exit.
With many threads, this gives a significant perf boost.
x86_64, non-FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 45131875 RNG (16 bytes) operations in 300039649us (150419.7 ops/sec): 2.4 MB/s
Did 44089000 RNG (32 bytes) operations in 300053237us (146937.3 ops/sec): 4.7 MB/s
Did 43328000 RNG (256 bytes) operations in 300058423us (144398.5 ops/sec): 37.0 MB/s
Did 45857000 RNG (1350 bytes) operations in 300095943us (152807.8 ops/sec): 206.3 MB/s
Did 43249000 RNG (8192 bytes) operations in 300102698us (144114.0 ops/sec): 1180.6 MB/s
After:
Did 296204000 RNG (16 bytes) operations in 300009524us (987315.3 ops/sec): 15.8 MB/s
Did 311347000 RNG (32 bytes) operations in 300014396us (1037773.5 ops/sec): 33.2 MB/s
Did 295104000 RNG (256 bytes) operations in 300012657us (983638.5 ops/sec): 251.8 MB/s
Did 255721000 RNG (1350 bytes) operations in 300016481us (852356.5 ops/sec): 1150.7 MB/s
Did 103339000 RNG (8192 bytes) operations in 300040059us (344417.3 ops/sec): 2821.5 MB/s
(Smaller PRNG draws are more impacted because they spend less time in the
DRBG. But they're also more likely because you rarely need to pull 8K of
data out at once.)
x86_64, FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:
Before:
Did 29060000 RNG (16 bytes) operations in 300081190us (96840.5 ops/sec): 1.5 MB/s
Did 31882000 RNG (32 bytes) operations in 300118031us (106231.5 ops/sec): 3.4 MB/s
Did 30925000 RNG (256 bytes) operations in 300113646us (103044.3 ops/sec): 26.4 MB/s
Did 31969000 RNG (1350 bytes) operations in 300096688us (106529.0 ops/sec): 143.8 MB/s
Did 33434000 RNG (8192 bytes) operations in 300093240us (111412.0 ops/sec): 912.7 MB/s
After:
Did 299013000 RNG (16 bytes) operations in 300012167us (996669.6 ops/sec): 15.9 MB/s
Did 289788000 RNG (32 bytes) operations in 300014611us (965913.0 ops/sec): 30.9 MB/s
Did 298699000 RNG (256 bytes) operations in 300013443us (995618.7 ops/sec): 254.9 MB/s
Did 247061000 RNG (1350 bytes) operations in 300018215us (823486.7 ops/sec): 1111.7 MB/s
Did 100479000 RNG (8192 bytes) operations in 300037708us (334887.9 ops/sec): 2743.4 MB/s
On an M1 Pro, it's mostly a wash by default (fewer threads because this chip has fewer cores)
aarch64, M1 Pro, 8 threads:
Before:
Did 23218000 RNG (16 bytes) operations in 80009131us (290191.9 ops/sec): 4.6 MB/s
Did 23021000 RNG (256 bytes) operations in 80007544us (287735.4 ops/sec): 73.7 MB/s
Did 22853000 RNG (1350 bytes) operations in 80013184us (285615.4 ops/sec): 385.6 MB/s
Did 25407000 RNG (8192 bytes) operations in 80008371us (317554.3 ops/sec): 2601.4 MB/s
Did 22128000 RNG (16384 bytes) operations in 80013269us (276554.1 ops/sec): 4531.1 MB/s
After:
Did 23303000 RNG (16 bytes) operations in 80011433us (291245.9 ops/sec): 4.7 MB/s
Did 23072000 RNG (256 bytes) operations in 80008755us (288368.4 ops/sec): 73.8 MB/s
Did 22807000 RNG (1350 bytes) operations in 80013355us (285039.9 ops/sec): 384.8 MB/s
Did 23759000 RNG (8192 bytes) operations in 80010212us (296949.6 ops/sec): 2432.6 MB/s
Did 23193000 RNG (16384 bytes) operations in 80011537us (289870.7 ops/sec): 4749.2 MB/s
This is likely because, without RDRAND or MADV_WIPEONFORK, we draw from
the OS on every call. We're likely bottlenecked by getentropy, whether
it's some internal synchronization or syscall overherad. With
fork-unsafe buffering enabled, this change shows even more significant
wins on the M1 Pro.
aarch64, fork-unsafe buffering, M1 Pro, 8 threads:
Before:
Did 25727000 RNG (16 bytes) operations in 80010579us (321545.0 ops/sec): 5.1 MB/s
Did 25776000 RNG (32 bytes) operations in 80008587us (322165.4 ops/sec): 10.3 MB/s
Did 25780000 RNG (256 bytes) operations in 80006127us (322225.3 ops/sec): 82.5 MB/s
Did 33171250 RNG (1350 bytes) operations in 80002532us (414627.5 ops/sec): 559.7 MB/s
Did 54784000 RNG (8192 bytes) operations in 80005706us (684751.2 ops/sec): 5609.5 MB/s
After:
Did 573826000 RNG (16 bytes) operations in 80000668us (7172765.1 ops/sec): 114.8 MB/s
Did 571329000 RNG (32 bytes) operations in 80000423us (7141574.7 ops/sec): 228.5 MB/s
Did 435043750 RNG (256 bytes) operations in 80000214us (5438032.3 ops/sec): 1392.1 MB/s
Did 229536000 RNG (1350 bytes) operations in 80001888us (2869132.3 ops/sec): 3873.3 MB/s
Did 57253000 RNG (8192 bytes) operations in 80004974us (715618.0 ops/sec): 5862.3 MB/s
Note that, on hardware with RDRAND, the read lock in
rand_fork_unsafe_buffering_enabled() doesn't do much. But without
RDRAND, we hit that on every RAND_bytes call. More importantly, the
subsequent CL will fix a bug that will require us to hit it more
frequently.
I've removed the volatile on g_fork_detect_addr because I think we
didn't need it and this avoids thinking about the interaction between
volatile and atomics. The pointer is passed into madvise, so the
compiler knows the pointer escapes. For it to be invalid, the compiler
would need to go out of its way to model madvise as not remembering the
pointer, which would be incorrect of it for MADV_WIPEONFORK.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ie6977acd1b8e7639aaa419cf6f4f5f0645bde9d1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59849
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
refcount.c is now a single, generic file that calls into C11-atomic-like
APIs. Behind the scenes, this selects one of C11 atomics, Windows
interlocked APIs, or unsynchronized reads/writes (in the no-threads
build).
This frees us up to use atomics elsewhere in the library. For now, this
only binds sequentially consistent atomics, but we can add other memory
orders if needed. In particular, I believe up_ref only needs relaxed
atomics. Some of the later change I think only need acquire and release,
but I'm not positive.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: Ifcd7357611bb7a8cd14b82c23ad080d1a2df1386
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59848
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
On Windows, we can rely on Interlocked APIs. On non-Windows builds, we
currently require C11 but permit C11 atomics to be missing, via
__STDC_NO_ATOMICS__. This CL tightens this so C11 atomics are required
on non-MSVC builds.
My hope is that, now that we require C11 on non-Windows, this is a
fairly safe requirement. We already require pthreads on any platform
where this might apply, and it's hard to imagine someone has C11,
pthreads, but not C11 atomics.
This change means that, in later work, we can refactor the refcount
logic to instead be a <stdatomic.h> compatibility layer, and then an
atomics-targetting CRYPTO_refcount_t implementation. With a
<stdatomic.h> compatibility layer, we can use atomics in more places,
notably where our uses of read locks are causing cacheline contention.
The platform restriction isn't *strictly* necessary. We could, like with
refcounts, emulate <stdatomic.h> with a single, global lock. Indeed any
platforms in this situation have already been living with that lock for
refcounts without noticing. But then later work to add "atomics" to read
locks would regress contention for those platforms. So I'm starting by
rejecting this, so if any such platform exists, we can understand their
performance needs before doing that.
Update-Note: On non-Windows platforms, we now require C11 atomics
support. Note we already require C11 itself. If this affects your build,
get in touch with BoringSSL maintainers.
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I868fa4ba87ed73dfc9d52e80d46853ef56715a5f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59847
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Right now, MSVC has to fallback to refcount_lock.c, which uses a single,
global lock for all refcount operations. Instead, use the Interlocked*
APIs to implement them.
The motivation is two-fold. First, this removes a performance cliff when
building for Windows on a non-Clang compiler. (Although I've not been
able to measure it in an end-to-end EVP benchmark, only a synthetic
refcount-only benchmark.)
More importantly, it gets us closer to assuming atomics support on all
non-NO_THREADS configurations. (The next CL will clear through that.)
That, in turn, will make it easier to add an atomics-like abstractions
to some of our hotter synchronization points. (Even in newer glibc, with
its better rwlock, read locks fundamentally need to write to memory, so
we have some cacheline contention on shared locks.)
Annoyingly, the Windows atomic_load replacement is not quite right. I've
used a "no-op" InterlockedCompareExchange(p, 0, 0) which, empirically,
still results in a write. But a write to the refcount cacheline is
surely better than taking a global exclusive lock. See comments in file
for details. OpenSSL uses InterlockedOr(p, 0), but that actually results
in even worse code. (InterlockedOr needs a retry loop when the
underlying cmpxchg fails, whereas InterlockedCompareExchange is a single
cmpxchg.)
Hopefully, in the future (perhaps when we require VS 2022's successor,
based on [1]), this can be removed in favor of C11 atomics everywhere.
[1] https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/
Bug: 570
Cq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan
Change-Id: I125da139e2fd3ae51e54309309fda16ba97ccf20
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59846
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
512-bit RSA was factored in 1999, so this limit barely means anything.
But establish some limit now to ratchet in what we can. We'll raise this
limit as we clear through further rounds of bad keys in tests.
As part of this, I've touched up rsa_test.cc a bit. All the functions
that made assumptions on key size now use std::vector with RSA_size.
kKey1 and kKey2 were also 512- and 400-bit RSA, respectively. In
principle, we could keep kKey1 for now, but the next stage will break it
anyway. I've replaced them with kFIPSKey (which was "FIPS-compliant" but
actually 1024-bit) and kTwoPrime (remnant of multi-prime RSA, 2048-bit).
As neither name makes sense, they're just the new kKey1 and kKey2.
I've also switched from string literals to arrays, which avoids the
pesky trailing NUL. Sadly, it is a bit more verbose. Maybe we should
switch to writing something like:
const std::vector<uint8_t> kKey1 = MustDecodeHex("abcdef1234...");
Static initializers don't matter in tests, after all.
Update-Note: We no longer accept 511-bit RSA and below. If you run into
this, update test keys to more modern sizes as we plan to raise the
limit beyond 512-bit RSA in the future. 512-bit RSA was factored in
1999, so keys at or near this limit have been obsolete for a very, very
long time.
Bug: 607
Change-Id: I13c3366d7e5f326710f1d1b298f4150a4e8e4d78
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59827
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
We currently don't enforce rsa_check_public_key invariants on private
key operations, only public key operations and RSA_check_key. This
means it was actually possible, in some corner cases, to perform
operations with oversized e or n. Fix this.
This gets us a bit closer to aligning the mess of RSA invariants.
(RSA_check_key still performs some extra checks, but most of those
should be redundant with the CRT self-check.)
Update-Note: Manually constructed RSA private keys with invalid n or e
will now fail private key operations. Such keys would always fail at
public key operations (so the signatures would never verify). They also
already failed RSA_check_key and parsing.
The one incompatibility of note is keys with only n and d, constructed
by reaching into the internal RSA struct, no longer work. Instead, use
RSA_new_private_key_no_e. Conscrypt used to do this but has since been
migrated to the new API.
Bug: 316
Change-Id: I062fdad924b8698e257dab9760687e4b381c970d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59826
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
RSATest.MissingParameters tests this case a bit more extensively.
Change-Id: I61e38bd139c6334ff9d5c9636a159cb20bcd7f7b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59825
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
This is inspired somewhat from how https://github.com/google/benchmark's
threaded benchmark support works. (It seems to spawn a bunch of threads,
latch them all together, and then run.)
This adds a TimeFunctionParallel which runs multiple copies of the
benchmark in parallel, after waiting for all the threads to synchronize.
Some functions had to be tweaked so they don't write to a single, shared
output buffer.
This probably could use some improvement. In playing with it, the
numbers are pretty unstable. We also don't currently benchmark anything
that captures EVP's internal refcounts. But hopefully it's enough to get
a start. I am able to measure impacts from the PRNG locks at least.
Bug: 570
Change-Id: I92c29a05ba082fc45701afd6f0effe23f7b148bd
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59845
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Saves some duplicated logic.
Change-Id: I202fa92a88101f9ad735648bc414ab05752641da
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59685
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Trying to fix all the places where these formats go quadratic isn't a
good use of time. We've already documented that they're not safe for use
with untrusted inputs. Even without such DoS issues, they cannot be
safely used anyway. (E.g. RUSTSEC-2023-0023.)
Just cap the fuzzer input. It'd be nice if we could avoid this more
systematically in the function, but they're not structured to make this
easy to do, and anyone concerned about DoS in this function has worse
problems.
Bug: chromium:1444420, oss-fuzz:56048, 611
Change-Id: I53eeb346f59278ec2db3aac4a92573b927ed8003
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59785
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
This causes avcptool to send requests without blocking on responses. See
the diff in ACVP.md for details of how to use this feature.
Change-Id: I922b3bd2383cb7d22a5d12ead49d2fa733ee1b97
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55345
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
It's been a couple years since we did that.
I had to upstream a fix to CreateArgvFromArgs, but that landed quickly,
so we're actually carrying no googletest patches with this.
Change-Id: Ifaf3476f2079d444512a235756cfe54f492b9c92
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59745
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Change-Id: I1a4914cc4859ed1e0797a046a1abec8921c4a9cf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59746
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Envoy needs to have the possible cipher, etc., strings predeclared to
reduce synchronization needs in the steady state. It currently does this
by (1) iterating over SSL_CTX_get_ciphers at SSL_CTX creation time and
(2) hard-coding a lists of known TLS 1.3 ciphers, TLS versions,
NamedGroups, etc.
(1) would work for some applications, but it breaks any applications
that configure ciphers on the SSL on a certificate callback, etc. If the
callback configures a cipher that wasn't configured on the SSL_CTX (e.g.
if the SSL_CTX were left at defaults), Envoy's logging breaks and we hit
an ENVOY_BUG assertion.
(2) breaks whenever BoringSSL adds a new feature. In principle, we could
update Envoy when updating BoringSSL, but this is an unresasonable
development overhead for just one of many BoringSSL consumers to impose.
Such costs are particularly high when considering needing to coordinate
updates to Envoy and BoringSSL across different repositories.
Add APIs to enumerate the possible strings these functions can return.
These string lists are a superset of those that any one application may
care about (e.g. we may have a deprecated cipher that Envoy no longer
needs, or an experimental cipher that's not yet ready for Envoy's
stability goals), but this is fine provided this is just used to
initialize the table. In particular, they are *not* intended to
enumerate supported features.
Bump BORINGSSL_API_VERSION to aid in patching these into Envoy.
Bug: b:280350955
Change-Id: I4d11db980eebed5620d3657778c09dbec004653c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59667
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This TLS 1.2 algorithm is substantially inferior to AES-GCM and should
never be used. It will not be available unless configured by name.
However, in can be used to provide backwards-compatibility with devices
that cannot be updated if so needed.
Change-Id: I1fd78efeb33aceca76ec2e7cb76b70f761ed1af8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59585
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: Adam Langley <agl@google.com>
rsa_test.cc and a few of the tests in evp_tests.txt still do, but
refresh the easy one. I switched it to a P-256 key to keep the input
small.
Bug: 607
Change-Id: Ie0614280d12012bbd928f095eb4531e4d7550ae1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59666
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>