I was inspired to look at this again recently and noticed we could do a
bit better. Instead of a tower of selects, rely on all the cases being
mutually exclusive and use the ret |= mask & value formulation without
loss in clarity. We do need to fixup the invalid case slightly, but
since that computation is mostly independent, I'm guessing the CPU and
compiler are able to schedule it effectively.
Before:
Did 251000 base64 decode operations in 2002569us (159.4 MB/sec)
After:
Did 346000 base64 decode operations in 2005426us (219.5 MB/sec) [+37.7%]
Change-Id: I542167202fd4e94c93dd5a2519a97bc388072c89
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/49525
Reviewed-by: Adam Langley <agl@google.com>