T-Head C908 (cycles): vc1dsp.vc1_inv_trans_4x4_c: 310.7 vc1dsp.vc1_inv_trans_4x4_rvv_i32: 120.0 We could use 1 `vlseg4e64.v` instead of 4 `vle16.v`, but that seems to be about 7% slower.