GCC 10.2.1 seems to be emitting code like this:
movq gcm_gmult_clmul@GOTPCREL(%rip), %xmm0
movhps gcm_ghash_clmul@GOTPCREL(%rip), %xmm0
movaps %xmm0, (%rsp)
This is assembling a pair of function pointers in %xmm0 and writing the
two out together. I've not observed the compiler output movlps, but
supporting movhps and movlps are about as tricky. The main complication
is that these instructions preserve the unwritten half of the
destination, and they do not support register sources, only memory.
This CL supports them by loading in a general-purpose register as we
usually do, pushing the register on the stack, and then running the
instruction on (%rsp). Some alternatives I considered:
- Save/restore a temporary XMM register and then use MOVHLPS and
MOVLHPS. This would work but require another saveRegister-like
wrapper.
- Take advantage of loadFromGOT ending in a memory mov and swap out
the final instruction. This would be more efficient, but we downgrade
GOT-based accesses to local symbols to a plain LEA. The compiler will
only do this when we write a pair of function pointers in a row, so
trying to optimize the non-local symbols seems not worth the trouble.
(Really the compiler should not be emitting GOT-relative loads at all,
but the compiler doesn't know these symbols will be private and in the
same module, so it has a habit of pessimally using GOT-based loads.)
This option seemed the simplest.
Change-Id: I8c4915a6a0d72aa4c5f4d581081b99b3a6ab64c2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/45244
Reviewed-by: Adam Langley <agl@google.com>