This reverts commit 97873cd1a5.
In some configurations, Clang (when building for debug) compiles
poly_mul_vec_aux with a huge (12KB) stack frame. It appears to be
expanding the code into a huge SSA graph and then assigning every value
in the graph to the stack. When optimising it can store the live values
in only a few hundreds bytes of stack, but since this function recurses,
the debug build can overflow the stack.
Unclear quite what to do about this, but this change solves the
immediate issue.
Change-Id: Ifa531f91f10bf04de92e8d07fbb14c668b366454
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57065
Auto-Submit: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
While this assembly implementation is faster in microbenchmarks, the
cache pressure makes it slightly worse than the C code in larger
benchmarks.
Before:
Did 7686 HRSS generate operations in 1056025us (7278.2 ops/sec)
Did 90000 HRSS encap operations in 1010095us (89100.5 ops/sec)
Did 28000 HRSS decap operations in 1031008us (27157.9 ops/sec)
After:
Did 3523 HRSS generate operations in 1045508us (3369.7 ops/sec)
Did 43000 HRSS encap operations in 1017077us (42278.0 ops/sec)
Did 17000 HRSS decap operations in 1011170us (16812.2 ops/sec)
Change-Id: Ia7745b50393f2d2849867e7c5c0af59d651f243d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55885
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
For the C files, rather than force the caller to juggle
crypto_linux_sources, etc., we just wrap the whole file in ifdefs and
ask the callers to link everything together.
Assembly is typically built by a different tool, so we have less room
here. However, there are really only two families of tools we care
about: gas (which runs the C preprocessor) and nasm (which has its own
preprocessor). Callers should be able to limit themselves to
special-casing Windows x86(_64) for NASM and then pass all the remaining
assembly files to their gas-like tool. File-wide ifdefs can take care of
the rest.
We're almost set up to allow this, except the files condition on
architecture, but not OS. Add __ELF__, __APPLE__, and _WIN32 conditions
as appropriate.
One subtlety: the semantics of .note.GNU-stack are that *any* unmarked
object file makes the stack executable. (In current GNU ld. lld doesn't
have this issue, and GNU ld claims they'll remove it in a later
release.) Empirically, this doesn't seem to apply to empty object files
but, to be safe, we should ensure all object files have the marking.
That leads to a second subtlety: on targets where @ is a comment,
@progbits is spelled %progbits, per [0]. If we want all .S files to work
in all targets, that includes these markers. Fortunately, %progbits
appears to work universally (see [1], [2], [3], [4]), so I've just
switched us to that spelling.
I've also tightened up the __arm__ and __aarch64__ checks to __ARMEL__
and __AARCH64EL__. We don't support big-endian Arm (or any other
platform) and, even if we did, the conditions in the assembly files
should match the conditions in the C files that pull them in.
This CL doesn't change our build to take advantage of this (though I'll
give it a go later), just makes it possible for builds to do it.
[0] https://sourceware.org/binutils/docs/as/Section.html
[1] https://patchwork.kernel.org/project/linux-crypto/patch/20170119212805.18049-1-dvlasenk@redhat.com/#20050285
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92820#c11
[3] https://sourceware.org/legacy-ml/gdb-patches/2016-01/msg00319.html
[4] de990b270d
Bug: 542
Change-Id: I0a8ded24423087c0da13bd0335cbd757d4eee65a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55626
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The C11 change has survived for three months now. Let's start freely
using static_assert. In C files, we need to include <assert.h> because
it is a macro. In C++ files, it is a keyword and we can just use it. (In
MSVC C, it is actually also a keyword as in C++, but close enough.)
I moved one assert from ssl3.h to ssl_lib.cc. We haven't yet required
C11 in our public headers, just our internal files.
Change-Id: Ic59978be43b699f2c997858179a9691606784ea5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/53665
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Polynomials have more elements than strictly needed in order to better
align them for when vector instructions are available. The
|poly_mul_vec| path is sensitive to this and so zeros out the extra
elements before starting work. That means that it'll write to a const
pointer, even if it'll always write the same value. However, while this
code is intended for ephemeral uses, we have a case where this is a data
race that upsets tools.
Therefore this change always normalises the polynomials, even if it's
running in a configuration that doesn't care.
Change-Id: I83eb04c5ce7c4e7ca959f2dd7fbd3efbe306d989
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52186
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
On Arm, our CRYPTO_is_*_capable functions check the corresponding
preprocessor symbol. This allows us to automatically drop dynamic checks
and fallback code when some capability is always avilable.
This CL does the same on x86, as well as consolidates our
OPENSSL_ia32cap_P checks in one place. Since this abstraction is
incompatible with some optimizations we do around OPENSSL_ia32cap_get()
in the FIPS module, I've marked the symbol __attribute__((const)), which
is enough to make GCC and Clang do the optimizations for us. (We already
do the same to DEFINE_BSS_GET.)
Most x86 platforms support a much wider range of capabilities, so this
is usually a no-op. But, notably, all x86_64 Mac hardware has SSSE3
available, so this allows us to statically drop an AES implementation.
(On macOS with -Wl,-dead_strip, this seems to trim 35080 bytes from the
bssl binary.) Configs like -march=native can also drop a bunch of code.
Update-Note: This CL may break build environments that incorrectly mark
some instruction as statically available. This is unlikely to happen
with vector instructions like AVX, where the compiler could freely emit
them anyway. However, instructions like AES-NI might be set incorrectly.
Change-Id: I44fd715c9887d3fda7cb4519c03bee4d4f2c7ea6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51548
Reviewed-by: Adam Langley <agl@google.com>
The latest version of ACLE splits __ARM_FEATURE_CRYPTO into two defines
to reflect that, starting ARMv8.2, the cryptography extension can
include {AES,PMULL} and {SHA1,SHA256} separately.
Also standardize on __ARM_NEON, which is the recommended symbol from
ACLE, and the only one defined on non-Apple aarch64 targets. Digging
through GCC history, __ARM_NEON__ is a bit older. __ARM_NEON was added
in GCC's 9e94a7fc5ab770928b9e6a2b74e292d35b4c94da from 2012, part of GCC
4.8.0.
I suspect we can stop paying attention to __ARM_NEON__ at this point,
but I've left both working for now. __ARM_FEATURE_{AES,SHA2} is definite
too new to fully replace __ARM_FEATURE_CRYPTO.
Tested on Linux that -march=armv8-a+aes now also drops the fallback AES
code. Previously, we would pick up -march=armv8-a+crypto, but not
-march=armv8-a+aes. Also tested that, on an OPENSSL_STATIC_ARMCAP build,
-march=armv8-a+sha2 sets the SHA-1 and SHA-256 features.
Change-Id: I749bdbc501ba2da23177ddb823547efcd77e5c98
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50847
Reviewed-by: Adam Langley <agl@google.com>
These symbols were not marked OPENSSL_EXPORT, so they weren't really
usable externally anyway. They're also very sensitive to various build
configuration toggles, which don't always get reflected into projects
that include our headers. Move them to crypto/internal.h.
Change-Id: I79a1fcf0b24e398d75a9cc6473bae28ec85cb835
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50846
Reviewed-by: Adam Langley <agl@google.com>
The polynomials have 701, 16-bit values. But poly_Rq_mul was reading 32
bytes at offset 1384 in order to get the last 18 of them. This silently
worked for a long time, but when 7153013019 switched to keeping
variables on the stack it was noticed by Valgrind.
This change fixes the overread. Setting watchpoints at the ends of the
two inputs (and one output) now shows no overreads nor overwrites.
BUG=424
Change-Id: Id86c1407ffce66593541c10feee47213f4b95c5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48645
Reviewed-by: David Benjamin <davidben@google.com>
The stack consumption of the HRSS functions is causing issues in
stack-constrained environments. Therefore allocate many variables on the
heap. This means that several HRSS_ functions now allocate, and thus can
fail, where they couldn't before. Callers that ignore the return value
and don't have crash-on-failure mallocs will still be safe, although
things will fail to decrypt later on.
Somehow, this actually makes key generation _faster_ on my machine. (I
don't know. Better alignment? Fewer L1 collisions?) The other operations
are slightly slower, as expected.
Before:
Did 17390 HRSS generate operations in 3054088us (5694.0 ops/sec)
Did 225000 HRSS encap operations in 3000512us (74987.2 ops/sec)
Did 87000 HRSS decap operations in 3014525us (28860.3 ops/sec)
After:
Did 21300 HRSS generate operations in 3026637us (7037.5 ops/sec)
Did 221000 HRSS encap operations in 3008911us (73448.5 ops/sec)
Did 84000 HRSS decap operations in 3007622us (27929.0 ops/sec)
Change-Id: I2312df8909af7d8d250c7c483c65038123f21ad9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48345
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Also use a slightly more conservative pattern. Instead of aligning the
pointer as a uintptr_t and casting back, compute the offset and advance
in pointer space. C guarantees that casting from pointer to uintptr_t
and back gives the same pointer, but general integer-to-pointer
conversions are generally implementation-defined. GCC does define it in
the useful way, but this makes fewer dependencies.
Change-Id: I70c7af735e892fe7a8333b78b39d7b1f3f1cdbef
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/48405
Reviewed-by: Adam Langley <alangley@gmail.com>