This cannot beat the Zbb implementation, and it is unlikely that a real meaningful CPU design would support V and not Zbb. The best loop rewrite that I could come up with (4 shifts, 2 ands, 3 ors) is still ~40% slower than Zbb. A proper faster vector implementation should be feasible with the cryptographic vector extensions, but that is a story for another time.pull/389/head
parent
5de1db5370
commit
61e5ca4ded
2 changed files with 1 additions and 27 deletions
Loading…
Reference in new issue