This implementation can advantage of hardware acceleration available on common CPUs when using GCC and Clang. A future update may enable this on MSVC as well. PiperOrigin-RevId: 487327024 Change-Id: I99a8f1bcbdf25297e776537e23bd0a902e0818a1