This is slower than the Zbb version on real hardware due to register
strides. Proper support for vector byte-swap requires the Zvbb
extension, but it's much too early for me to worry about it.
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
Add missing operand which clang complains about but GCC assumes it to be
'm1' if not specified.
Works around build failure with Clang:
| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8|16|32|64|128|256|512|1024],m[1|2|4|8|f2|f4|f8],[ta|tu],[ma|mu]
| vsetvli t4, t3, e8, ta, ma
| ^
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
This is currently 64-bit only because the stack spilling code would not
assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory).
This could be added later in the unlikely that someone wants it.