mirror of https://github.com/FFmpeg/FFmpeg.git
Since len < 64, the registers are sufficient, so it can be directly unrolled (a4 is even). Another benefit of unrolling is that it reduces one load operation vertically compared to horizontally. old new C908 X60 C908 X60 vp8_put_bilin4_h_c : 6.2 5.5 : 6.2 5.5 vp8_put_bilin4_h_rvv_i32 : 2.2 2.0 : 1.5 1.5 vp8_put_bilin4_v_c : 6.5 5.7 : 6.2 5.7 vp8_put_bilin4_v_rvv_i32 : 2.2 2.0 : 1.2 1.5 vp8_put_bilin8_h_c : 24.2 21.5 : 24.2 21.5 vp8_put_bilin8_h_rvv_i32 : 5.2 4.7 : 3.5 3.5 vp8_put_bilin8_v_c : 24.5 21.7 : 24.5 21.7 vp8_put_bilin8_v_rvv_i32 : 5.2 4.7 : 3.5 3.2 vp8_put_bilin16_h_c : 48.0 42.7 : 48.0 42.7 vp8_put_bilin16_h_rvv_i32 : 5.7 5.0 : 5.2 4.5 vp8_put_bilin16_v_c : 48.2 43.0 : 48.2 42.7 vp8_put_bilin16_v_rvv_i32 : 5.7 5.2 : 4.5 4.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>release/7.1
parent
d72a5fe719
commit
8d9fb7b5cf
1 changed files with 29 additions and 5 deletions
Loading…
Reference in new issue