The assembly functions from OpenSSL vary in how the counter overflow
works. The aarch64 implementation uses a mix of 32-bit and 64-bit
counters. This is because, when packing a block into 64-bit
general-purpose registers, it is easier to implement a 64-bit counter
than a 32-bit one. Whereas, on 32-bit general-purpose registers, or when
using vector registers with 32-bit lanes, it is easier to implement a
32-bit counter.
Counters will never overflow with the AEAD, which sets the length limit
so it never happens. (Failing to do so will reuse a key/nonce/counter
triple.) RFC 8439 is silent on what happens on overflow, so at best one
can say it is implicitly undefined behavior.
This came about because pyca/cryptography reportedly exposed a ChaCha20
API which encouraged callers to randomize the starting counter. Wrapping
with a randomized starting counter isn't inherently wrong, though it is
pointless and goes against how the spec recommends using the initial
counter value.
Nonetheless, we would prefer our functions behave consistently across
platforms, rather than silently give ill-defined output given some
inputs. So, normalize the behavior to the wrapping version in
CRYPTO_chacha_20 by dividing up into multiple ChaCha20_ctr32 calls as
needed.
Fixed: 614
Change-Id: I191461f25753b9f6b59064c6c08cd4299085e172
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60387
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>