|
|
|
# Building BoringSSL
|
|
|
|
|
|
|
|
## Build Prerequisites
|
|
|
|
|
|
|
|
The standalone CMake build is primarily intended for developers. If embedding
|
|
|
|
BoringSSL into another project with a pre-existing build system, see
|
|
|
|
[INCORPORATING.md](/INCORPORATING.md).
|
|
|
|
|
|
|
|
Unless otherwise noted, build tools must at most five years old, matching
|
|
|
|
[Abseil guidelines](https://abseil.io/about/compatibility). If in doubt, use the
|
|
|
|
most recent stable version of each tool.
|
|
|
|
|
|
|
|
* [CMake](https://cmake.org/download/) 3.12 or later is required.
|
|
|
|
|
|
|
|
* A recent version of Perl is required. On Windows,
|
|
|
|
[Active State Perl](http://www.activestate.com/activeperl/) has been
|
|
|
|
reported to work, as has MSYS Perl.
|
|
|
|
[Strawberry Perl](http://strawberryperl.com/) also works but it adds GCC
|
|
|
|
to `PATH`, which can confuse some build tools when identifying the compiler
|
|
|
|
(removing `C:\Strawberry\c\bin` from `PATH` should resolve any problems).
|
|
|
|
If Perl is not found by CMake, it may be configured explicitly by setting
|
|
|
|
`PERL_EXECUTABLE`.
|
|
|
|
|
|
|
|
* Building with [Ninja](https://ninja-build.org/) instead of Make is
|
|
|
|
recommended, because it makes builds faster. On Windows, CMake's Visual
|
|
|
|
Studio generator may also work, but it not tested regularly and requires
|
|
|
|
recent versions of CMake for assembly support.
|
|
|
|
|
|
|
|
* On Windows only, [NASM](https://www.nasm.us/) is required. If not found
|
|
|
|
by CMake, it may be configured explicitly by setting
|
|
|
|
`CMAKE_ASM_NASM_COMPILER`.
|
|
|
|
|
|
|
|
* Compilers for C11 and C++14, or later, are required. On Windows, MSVC from
|
|
|
|
Visual Studio 2019 or later with Windows 10 SDK 2104 or later are
|
|
|
|
supported, but using the latest versions is recommended. Recent versions of
|
|
|
|
GCC (6.1+) and Clang should work on non-Windows platforms, and maybe on
|
|
|
|
Windows too.
|
|
|
|
|
|
|
|
* The most recent stable version of [Go](https://golang.org/dl/) is required.
|
|
|
|
Note Go is exempt from the five year support window. If not found by CMake,
|
|
|
|
the go executable may be configured explicitly by setting `GO_EXECUTABLE`.
|
|
|
|
|
|
|
|
* On x86_64 Linux, the tests have an optional
|
|
|
|
[libunwind](https://www.nongnu.org/libunwind/) dependency to test the
|
|
|
|
assembly more thoroughly.
|
|
|
|
|
|
|
|
## Building
|
|
|
|
|
|
|
|
Using Ninja (note the 'N' is capitalized in the cmake invocation):
|
|
|
|
|
|
|
|
cmake -GNinja -B build
|
|
|
|
ninja -C build
|
|
|
|
|
|
|
|
Using Make (does not work on Windows):
|
|
|
|
|
|
|
|
cmake -B build
|
|
|
|
make -C build
|
|
|
|
|
|
|
|
You usually don't need to run `cmake` again after changing `CMakeLists.txt`
|
|
|
|
files because the build scripts will detect changes to them and rebuild
|
|
|
|
themselves automatically.
|
|
|
|
|
|
|
|
Note that the default build flags in the top-level `CMakeLists.txt` are for
|
|
|
|
debugging—optimisation isn't enabled. Pass `-DCMAKE_BUILD_TYPE=Release` to
|
|
|
|
`cmake` to configure a release build.
|
|
|
|
|
|
|
|
If you want to cross-compile then there is an example toolchain file for 32-bit
|
|
|
|
Intel in `util/`. Wipe out the build directory, run `cmake` like this:
|
|
|
|
|
|
|
|
cmake -B build -DCMAKE_TOOLCHAIN_FILE=../util/32-bit-toolchain.cmake -GNinja
|
|
|
|
|
|
|
|
If you want to build as a shared library, pass `-DBUILD_SHARED_LIBS=1`. On
|
|
|
|
Windows, where functions need to be tagged with `dllimport` when coming from a
|
|
|
|
shared library, define `BORINGSSL_SHARED_LIBRARY` in any code which `#include`s
|
|
|
|
the BoringSSL headers.
|
|
|
|
|
|
|
|
In order to serve environments where code-size is important as well as those
|
|
|
|
where performance is the overriding concern, `OPENSSL_SMALL` can be defined to
|
|
|
|
remove some code that is especially large.
|
|
|
|
|
|
|
|
See [CMake's documentation](https://cmake.org/cmake/help/v3.4/manual/cmake-variables.7.html)
|
|
|
|
for other variables which may be used to configure the build.
|
|
|
|
|
|
|
|
### Building for Android
|
|
|
|
|
|
|
|
It's possible to build BoringSSL with the Android NDK using CMake. Recent
|
|
|
|
versions of the NDK include a CMake toolchain file which works with CMake 3.6.0
|
|
|
|
or later. This has been tested with version r16b of the NDK.
|
|
|
|
|
|
|
|
Unpack the Android NDK somewhere and export `ANDROID_NDK` to point to the
|
|
|
|
directory. Then run CMake like this:
|
|
|
|
|
|
|
|
cmake -DANDROID_ABI=armeabi-v7a \
|
|
|
|
-DANDROID_PLATFORM=android-19 \
|
|
|
|
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
|
|
|
|
-GNinja -B build
|
|
|
|
|
|
|
|
Once you've run that, Ninja should produce Android-compatible binaries. You
|
|
|
|
can replace `armeabi-v7a` in the above with `arm64-v8a` and use API level 21 or
|
|
|
|
higher to build aarch64 binaries.
|
|
|
|
|
|
|
|
For other options, see the documentation in the toolchain file.
|
|
|
|
|
|
|
|
To debug the resulting binaries on an Android device with `gdb`, run the
|
|
|
|
commands below. Replace `ARCH` with the architecture of the target device, e.g.
|
|
|
|
`arm` or `arm64`.
|
|
|
|
|
|
|
|
adb push ${ANDROID_NDK}/prebuilt/android-ARCH/gdbserver/gdbserver \
|
|
|
|
/data/local/tmp
|
|
|
|
adb forward tcp:5039 tcp:5039
|
|
|
|
adb shell /data/local/tmp/gdbserver :5039 /path/on/device/to/binary
|
|
|
|
|
|
|
|
Then run the following in a separate shell. Replace `HOST` with the OS and
|
|
|
|
architecture of the host machine, e.g. `linux-x86_64`.
|
|
|
|
|
|
|
|
${ANDROID_NDK}/prebuilt/HOST/bin/gdb
|
|
|
|
target remote :5039 # in gdb
|
|
|
|
|
|
|
|
### Building for iOS
|
|
|
|
|
|
|
|
To build for iOS, pass `-DCMAKE_OSX_SYSROOT=iphoneos` and
|
|
|
|
`-DCMAKE_OSX_ARCHITECTURES=ARCH` to CMake, where `ARCH` is the desired
|
|
|
|
architecture, matching values used in the `-arch` flag in Apple's toolchain.
|
|
|
|
|
|
|
|
Passing multiple architectures for a multiple-architecture build is not
|
|
|
|
supported.
|
|
|
|
|
|
|
|
### Building with Prefixed Symbols
|
|
|
|
|
|
|
|
BoringSSL's build system has experimental support for adding a custom prefix to
|
|
|
|
all symbols. This can be useful when linking multiple versions of BoringSSL in
|
|
|
|
the same project to avoid symbol conflicts.
|
|
|
|
|
|
|
|
In order to build with prefixed symbols, the `BORINGSSL_PREFIX` CMake variable
|
|
|
|
should specify the prefix to add to all symbols, and the
|
|
|
|
`BORINGSSL_PREFIX_SYMBOLS` CMake variable should specify the path to a file
|
|
|
|
which contains a list of symbols which should be prefixed (one per line;
|
|
|
|
comments are supported with `#`). In other words, `cmake -B build
|
|
|
|
-DBORINGSSL_PREFIX=MY_CUSTOM_PREFIX
|
|
|
|
-DBORINGSSL_PREFIX_SYMBOLS=/path/to/symbols.txt` will configure the build to add
|
|
|
|
the prefix `MY_CUSTOM_PREFIX` to all of the symbols listed in
|
|
|
|
`/path/to/symbols.txt`.
|
|
|
|
|
|
|
|
It is currently the caller's responsibility to create and maintain the list of
|
|
|
|
symbols to be prefixed. Alternatively, `util/read_symbols.go` reads the list of
|
|
|
|
exported symbols from a `.a` file, and can be used in a build script to generate
|
|
|
|
the symbol list on the fly (by building without prefixing, using
|
|
|
|
`read_symbols.go` to construct a symbol list, and then building again with
|
|
|
|
prefixing).
|
|
|
|
|
|
|
|
This mechanism is under development and may change over time. Please contact the
|
|
|
|
BoringSSL maintainers if making use of it.
|
|
|
|
|
|
|
|
## Known Limitations on Windows
|
|
|
|
|
|
|
|
* CMake can generate Visual Studio projects, but the generated project files
|
|
|
|
don't have steps for assembling the assembly language source files, so they
|
|
|
|
currently cannot be used to build BoringSSL.
|
|
|
|
|
Enable SHA-512 ARM acceleration when available.
This imports the changes to sha512-armv8.pl from
upstream's af0fcf7b4668218b24d9250b95e0b96939ccb4d1.
Tweaks needed:
- Add an explicit .text because we put .LK$BITS in .rodata for XOM
- .LK$bits and code are in separate sections, so use adrp/add instead of
plain adr
- Where glibc needs feature flags to *enable* pthread_rwlock, Apple
interprets _XOPEN_SOURCE as a request to *disable* Apple extensions.
Tighten the condition on the _XOPEN_SOURCE check.
Added support for macOS and Linux, tested manually on an ARM Mac and a
VM, respectively. Fuchsia and Windows do not currently have APIs to
expose this bit, so I've left in TODOs. Benchmarks from an Apple M1 Max:
Before:
Did 4647000 SHA-512 (16 bytes) operations in 1000103us (74.3 MB/sec)
Did 1614000 SHA-512 (256 bytes) operations in 1000379us (413.0 MB/sec)
Did 439000 SHA-512 (1350 bytes) operations in 1001694us (591.6 MB/sec)
Did 76000 SHA-512 (8192 bytes) operations in 1011821us (615.3 MB/sec)
Did 39000 SHA-512 (16384 bytes) operations in 1024311us (623.8 MB/sec)
After:
Did 10369000 SHA-512 (16 bytes) operations in 1000088us (165.9 MB/sec) [+123.1%]
Did 3650000 SHA-512 (256 bytes) operations in 1000079us (934.3 MB/sec) [+126.2%]
Did 1029000 SHA-512 (1350 bytes) operations in 1000521us (1388.4 MB/sec) [+134.7%]
Did 175000 SHA-512 (8192 bytes) operations in 1001874us (1430.9 MB/sec) [+132.5%]
Did 89000 SHA-512 (16384 bytes) operations in 1010314us (1443.3 MB/sec) [+131.4%]
(This doesn't seem to change the overall SHA-256 vs SHA-512 performance
question on ARM, when hashing perf matters. SHA-256 on the same chip
gets up to 2454.6 MB/s.)
In terms of build coverage, for now, we'll have build coverage
everywhere and test coverage on Chromium, which runs this code on macOS
CI. We should request a macOS ARM64 bot for our standalone CI. Longer
term, we need a QEMU-based builder to test various features. QEMU seems
to have pretty good coverage of all this, which will at least give us
Linux.
I haven't added an OPENSSL_STATIC_ARMCAP_SHA512 for now. Instead, we
just look at the standard __ARM_FEATURE_SHA512 define. Strangely, the
corresponding -march tag is not sha512. Neither GCC and nor Clang have
-march=armv8-a+sha512. Instead, -march=armv8-a+sha3 implies both
__ARM_FEATURE_SHA3 and __ARM_FEATURE_SHA512! Yet everything else seems
to describe the SHA512 extension as separate from SHA3.
https://developer.arm.com/architectures/system-architectures/software-standards/acle
Update-Note: Consumers with a different build setup may need to
limit -D_XOPEN_SOURCE=700 to Linux or non-Apple platforms. Otherwise,
<sys/types.h> won't define some typedef needed by <sys/sysctl.h>. If you
see a build error about u_char, etc., being undefined in some system
header, that is probably the cause.
Change-Id: Ia213d3796b84c71b7966bb68e0aec92e5d7d26f0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50807
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
3 years ago
|
|
|
## ARM CPU Capabilities
|
|
|
|
|
Enable SHA-512 ARM acceleration when available.
This imports the changes to sha512-armv8.pl from
upstream's af0fcf7b4668218b24d9250b95e0b96939ccb4d1.
Tweaks needed:
- Add an explicit .text because we put .LK$BITS in .rodata for XOM
- .LK$bits and code are in separate sections, so use adrp/add instead of
plain adr
- Where glibc needs feature flags to *enable* pthread_rwlock, Apple
interprets _XOPEN_SOURCE as a request to *disable* Apple extensions.
Tighten the condition on the _XOPEN_SOURCE check.
Added support for macOS and Linux, tested manually on an ARM Mac and a
VM, respectively. Fuchsia and Windows do not currently have APIs to
expose this bit, so I've left in TODOs. Benchmarks from an Apple M1 Max:
Before:
Did 4647000 SHA-512 (16 bytes) operations in 1000103us (74.3 MB/sec)
Did 1614000 SHA-512 (256 bytes) operations in 1000379us (413.0 MB/sec)
Did 439000 SHA-512 (1350 bytes) operations in 1001694us (591.6 MB/sec)
Did 76000 SHA-512 (8192 bytes) operations in 1011821us (615.3 MB/sec)
Did 39000 SHA-512 (16384 bytes) operations in 1024311us (623.8 MB/sec)
After:
Did 10369000 SHA-512 (16 bytes) operations in 1000088us (165.9 MB/sec) [+123.1%]
Did 3650000 SHA-512 (256 bytes) operations in 1000079us (934.3 MB/sec) [+126.2%]
Did 1029000 SHA-512 (1350 bytes) operations in 1000521us (1388.4 MB/sec) [+134.7%]
Did 175000 SHA-512 (8192 bytes) operations in 1001874us (1430.9 MB/sec) [+132.5%]
Did 89000 SHA-512 (16384 bytes) operations in 1010314us (1443.3 MB/sec) [+131.4%]
(This doesn't seem to change the overall SHA-256 vs SHA-512 performance
question on ARM, when hashing perf matters. SHA-256 on the same chip
gets up to 2454.6 MB/s.)
In terms of build coverage, for now, we'll have build coverage
everywhere and test coverage on Chromium, which runs this code on macOS
CI. We should request a macOS ARM64 bot for our standalone CI. Longer
term, we need a QEMU-based builder to test various features. QEMU seems
to have pretty good coverage of all this, which will at least give us
Linux.
I haven't added an OPENSSL_STATIC_ARMCAP_SHA512 for now. Instead, we
just look at the standard __ARM_FEATURE_SHA512 define. Strangely, the
corresponding -march tag is not sha512. Neither GCC and nor Clang have
-march=armv8-a+sha512. Instead, -march=armv8-a+sha3 implies both
__ARM_FEATURE_SHA3 and __ARM_FEATURE_SHA512! Yet everything else seems
to describe the SHA512 extension as separate from SHA3.
https://developer.arm.com/architectures/system-architectures/software-standards/acle
Update-Note: Consumers with a different build setup may need to
limit -D_XOPEN_SOURCE=700 to Linux or non-Apple platforms. Otherwise,
<sys/types.h> won't define some typedef needed by <sys/sysctl.h>. If you
see a build error about u_char, etc., being undefined in some system
header, that is probably the cause.
Change-Id: Ia213d3796b84c71b7966bb68e0aec92e5d7d26f0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50807
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
3 years ago
|
|
|
ARM, unlike Intel, does not have a userspace instruction that allows
|
|
|
|
applications to discover the capabilities of the processor. Instead, the
|
|
|
|
capability information has to be provided by a combination of compile-time
|
|
|
|
information and the operating system.
|
|
|
|
|
Switch __ARM_FEATURE_CRYPTO to __ARM_FEATURE_{AES,SHA2}.
The latest version of ACLE splits __ARM_FEATURE_CRYPTO into two defines
to reflect that, starting ARMv8.2, the cryptography extension can
include {AES,PMULL} and {SHA1,SHA256} separately.
Also standardize on __ARM_NEON, which is the recommended symbol from
ACLE, and the only one defined on non-Apple aarch64 targets. Digging
through GCC history, __ARM_NEON__ is a bit older. __ARM_NEON was added
in GCC's 9e94a7fc5ab770928b9e6a2b74e292d35b4c94da from 2012, part of GCC
4.8.0.
I suspect we can stop paying attention to __ARM_NEON__ at this point,
but I've left both working for now. __ARM_FEATURE_{AES,SHA2} is definite
too new to fully replace __ARM_FEATURE_CRYPTO.
Tested on Linux that -march=armv8-a+aes now also drops the fallback AES
code. Previously, we would pick up -march=armv8-a+crypto, but not
-march=armv8-a+aes. Also tested that, on an OPENSSL_STATIC_ARMCAP build,
-march=armv8-a+sha2 sets the SHA-1 and SHA-256 features.
Change-Id: I749bdbc501ba2da23177ddb823547efcd77e5c98
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50847
Reviewed-by: Adam Langley <agl@google.com>
3 years ago
|
|
|
BoringSSL determines capabilities at compile-time based on `__ARM_NEON`,
|
|
|
|
`__ARM_FEATURE_AES`, and other preprocessor symbols defined in
|
|
|
|
[Arm C Language Extensions (ACLE)](https://developer.arm.com/architectures/system-architectures/software-standards/acle).
|
Enable SHA-512 ARM acceleration when available.
This imports the changes to sha512-armv8.pl from
upstream's af0fcf7b4668218b24d9250b95e0b96939ccb4d1.
Tweaks needed:
- Add an explicit .text because we put .LK$BITS in .rodata for XOM
- .LK$bits and code are in separate sections, so use adrp/add instead of
plain adr
- Where glibc needs feature flags to *enable* pthread_rwlock, Apple
interprets _XOPEN_SOURCE as a request to *disable* Apple extensions.
Tighten the condition on the _XOPEN_SOURCE check.
Added support for macOS and Linux, tested manually on an ARM Mac and a
VM, respectively. Fuchsia and Windows do not currently have APIs to
expose this bit, so I've left in TODOs. Benchmarks from an Apple M1 Max:
Before:
Did 4647000 SHA-512 (16 bytes) operations in 1000103us (74.3 MB/sec)
Did 1614000 SHA-512 (256 bytes) operations in 1000379us (413.0 MB/sec)
Did 439000 SHA-512 (1350 bytes) operations in 1001694us (591.6 MB/sec)
Did 76000 SHA-512 (8192 bytes) operations in 1011821us (615.3 MB/sec)
Did 39000 SHA-512 (16384 bytes) operations in 1024311us (623.8 MB/sec)
After:
Did 10369000 SHA-512 (16 bytes) operations in 1000088us (165.9 MB/sec) [+123.1%]
Did 3650000 SHA-512 (256 bytes) operations in 1000079us (934.3 MB/sec) [+126.2%]
Did 1029000 SHA-512 (1350 bytes) operations in 1000521us (1388.4 MB/sec) [+134.7%]
Did 175000 SHA-512 (8192 bytes) operations in 1001874us (1430.9 MB/sec) [+132.5%]
Did 89000 SHA-512 (16384 bytes) operations in 1010314us (1443.3 MB/sec) [+131.4%]
(This doesn't seem to change the overall SHA-256 vs SHA-512 performance
question on ARM, when hashing perf matters. SHA-256 on the same chip
gets up to 2454.6 MB/s.)
In terms of build coverage, for now, we'll have build coverage
everywhere and test coverage on Chromium, which runs this code on macOS
CI. We should request a macOS ARM64 bot for our standalone CI. Longer
term, we need a QEMU-based builder to test various features. QEMU seems
to have pretty good coverage of all this, which will at least give us
Linux.
I haven't added an OPENSSL_STATIC_ARMCAP_SHA512 for now. Instead, we
just look at the standard __ARM_FEATURE_SHA512 define. Strangely, the
corresponding -march tag is not sha512. Neither GCC and nor Clang have
-march=armv8-a+sha512. Instead, -march=armv8-a+sha3 implies both
__ARM_FEATURE_SHA3 and __ARM_FEATURE_SHA512! Yet everything else seems
to describe the SHA512 extension as separate from SHA3.
https://developer.arm.com/architectures/system-architectures/software-standards/acle
Update-Note: Consumers with a different build setup may need to
limit -D_XOPEN_SOURCE=700 to Linux or non-Apple platforms. Otherwise,
<sys/types.h> won't define some typedef needed by <sys/sysctl.h>. If you
see a build error about u_char, etc., being undefined in some system
header, that is probably the cause.
Change-Id: Ia213d3796b84c71b7966bb68e0aec92e5d7d26f0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50807
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
3 years ago
|
|
|
These values are usually controlled by the `-march` flag. You can also define
|
Switch __ARM_FEATURE_CRYPTO to __ARM_FEATURE_{AES,SHA2}.
The latest version of ACLE splits __ARM_FEATURE_CRYPTO into two defines
to reflect that, starting ARMv8.2, the cryptography extension can
include {AES,PMULL} and {SHA1,SHA256} separately.
Also standardize on __ARM_NEON, which is the recommended symbol from
ACLE, and the only one defined on non-Apple aarch64 targets. Digging
through GCC history, __ARM_NEON__ is a bit older. __ARM_NEON was added
in GCC's 9e94a7fc5ab770928b9e6a2b74e292d35b4c94da from 2012, part of GCC
4.8.0.
I suspect we can stop paying attention to __ARM_NEON__ at this point,
but I've left both working for now. __ARM_FEATURE_{AES,SHA2} is definite
too new to fully replace __ARM_FEATURE_CRYPTO.
Tested on Linux that -march=armv8-a+aes now also drops the fallback AES
code. Previously, we would pick up -march=armv8-a+crypto, but not
-march=armv8-a+aes. Also tested that, on an OPENSSL_STATIC_ARMCAP build,
-march=armv8-a+sha2 sets the SHA-1 and SHA-256 features.
Change-Id: I749bdbc501ba2da23177ddb823547efcd77e5c98
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50847
Reviewed-by: Adam Langley <agl@google.com>
3 years ago
|
|
|
any of the following to enable the corresponding ARM feature, but using the ACLE
|
|
|
|
symbols via `-march` is recommended.
|
|
|
|
|
|
|
|
* `OPENSSL_STATIC_ARMCAP_NEON`
|
|
|
|
* `OPENSSL_STATIC_ARMCAP_AES`
|
|
|
|
* `OPENSSL_STATIC_ARMCAP_SHA1`
|
|
|
|
* `OPENSSL_STATIC_ARMCAP_SHA256`
|
|
|
|
* `OPENSSL_STATIC_ARMCAP_PMULL`
|
|
|
|
|
Enable SHA-512 ARM acceleration when available.
This imports the changes to sha512-armv8.pl from
upstream's af0fcf7b4668218b24d9250b95e0b96939ccb4d1.
Tweaks needed:
- Add an explicit .text because we put .LK$BITS in .rodata for XOM
- .LK$bits and code are in separate sections, so use adrp/add instead of
plain adr
- Where glibc needs feature flags to *enable* pthread_rwlock, Apple
interprets _XOPEN_SOURCE as a request to *disable* Apple extensions.
Tighten the condition on the _XOPEN_SOURCE check.
Added support for macOS and Linux, tested manually on an ARM Mac and a
VM, respectively. Fuchsia and Windows do not currently have APIs to
expose this bit, so I've left in TODOs. Benchmarks from an Apple M1 Max:
Before:
Did 4647000 SHA-512 (16 bytes) operations in 1000103us (74.3 MB/sec)
Did 1614000 SHA-512 (256 bytes) operations in 1000379us (413.0 MB/sec)
Did 439000 SHA-512 (1350 bytes) operations in 1001694us (591.6 MB/sec)
Did 76000 SHA-512 (8192 bytes) operations in 1011821us (615.3 MB/sec)
Did 39000 SHA-512 (16384 bytes) operations in 1024311us (623.8 MB/sec)
After:
Did 10369000 SHA-512 (16 bytes) operations in 1000088us (165.9 MB/sec) [+123.1%]
Did 3650000 SHA-512 (256 bytes) operations in 1000079us (934.3 MB/sec) [+126.2%]
Did 1029000 SHA-512 (1350 bytes) operations in 1000521us (1388.4 MB/sec) [+134.7%]
Did 175000 SHA-512 (8192 bytes) operations in 1001874us (1430.9 MB/sec) [+132.5%]
Did 89000 SHA-512 (16384 bytes) operations in 1010314us (1443.3 MB/sec) [+131.4%]
(This doesn't seem to change the overall SHA-256 vs SHA-512 performance
question on ARM, when hashing perf matters. SHA-256 on the same chip
gets up to 2454.6 MB/s.)
In terms of build coverage, for now, we'll have build coverage
everywhere and test coverage on Chromium, which runs this code on macOS
CI. We should request a macOS ARM64 bot for our standalone CI. Longer
term, we need a QEMU-based builder to test various features. QEMU seems
to have pretty good coverage of all this, which will at least give us
Linux.
I haven't added an OPENSSL_STATIC_ARMCAP_SHA512 for now. Instead, we
just look at the standard __ARM_FEATURE_SHA512 define. Strangely, the
corresponding -march tag is not sha512. Neither GCC and nor Clang have
-march=armv8-a+sha512. Instead, -march=armv8-a+sha3 implies both
__ARM_FEATURE_SHA3 and __ARM_FEATURE_SHA512! Yet everything else seems
to describe the SHA512 extension as separate from SHA3.
https://developer.arm.com/architectures/system-architectures/software-standards/acle
Update-Note: Consumers with a different build setup may need to
limit -D_XOPEN_SOURCE=700 to Linux or non-Apple platforms. Otherwise,
<sys/types.h> won't define some typedef needed by <sys/sysctl.h>. If you
see a build error about u_char, etc., being undefined in some system
header, that is probably the cause.
Change-Id: Ia213d3796b84c71b7966bb68e0aec92e5d7d26f0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50807
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
3 years ago
|
|
|
The resulting binary will assume all such features are always present. This can
|
|
|
|
reduce code size, by allowing the compiler to omit fallbacks. However, if the
|
|
|
|
feature is not actually supported at runtime, BoringSSL will likely crash.
|
|
|
|
|
|
|
|
BoringSSL will additionally query the operating system at runtime for additional
|
|
|
|
features, e.g. with `getauxval` on Linux. This allows a single binary to use
|
|
|
|
newer instructions when present, but still function on CPUs without them. But
|
|
|
|
some environments don't support runtime queries. If building for those, define
|
|
|
|
`OPENSSL_STATIC_ARMCAP` to limit BoringSSL to compile-time capabilities. If not
|
|
|
|
defined, the target operating system must be known to BoringSSL.
|
|
|
|
|
|
|
|
## Binary Size
|
|
|
|
|
|
|
|
The implementations of some algorithms require a trade-off between binary size
|
|
|
|
and performance. For instance, BoringSSL's fastest P-256 implementation uses a
|
|
|
|
148 KiB pre-computed table. To optimize instead for binary size, pass
|
|
|
|
`-DOPENSSL_SMALL=1` to CMake or define the `OPENSSL_SMALL` preprocessor symbol.
|
|
|
|
|
|
|
|
# Running Tests
|
|
|
|
|
|
|
|
There are two sets of tests: the C/C++ tests and the blackbox tests. For former
|
|
|
|
are built by Ninja and can be run from the top-level directory with `go run
|
|
|
|
util/all_tests.go`. The latter have to be run separately by running `go test`
|
|
|
|
from within `ssl/test/runner`.
|
|
|
|
|
|
|
|
Both sets of tests may also be run with `ninja -C build run_tests`, but CMake
|
|
|
|
3.2 or later is required to avoid Ninja's output buffering.
|