The height is a power of two of up to 16 rows. The current code was optimised for large sample counts. T-Head C908: h264_weight2_8_c: 211.7 ( 1.00x) h264_weight2_8_rvv_i32: before 184.0 ( 1.15x) h264_weight2_8_rvv_i32: after 54.2 ( 3.90x) h264_weight4_8_c: 285.7 ( 1.00x) h264_weight4_8_rvv_i32: before 341.2 ( 0.86x) h264_weight4_8_rvv_i32: after 82.2 ( 3.47x) h264_weight8_8_c: 498.7 ( 1.00x) h264_weight8_8_rvv_i32: before 683.7 ( 0.73x) h264_weight8_8_rvv_i64: after 128.5 ( 3.95x) h264_weight16_8_c: 878.2 ( 1.00x) h264_weight16_8_rvv_i32: unchanged 239.5 ( 3.67x) SpacemiT X60: h264_weight2_8_c: 207.2 ( 1.00x) h264_weight2_8_rvv_i32: before 259.6 ( 0.80x) h264_weight2_8_rvv_i32: after 82.2 ( 2.52x) h264_weight4_8_c: 290.8 ( 1.00x) h264_weight4_8_rvv_i32: before 509.6 ( 0.57x) h264_weight4_8_rvv_i32: after 61.5 ( 4.73x) h264_weight8_8_c: 498.8 ( 1.00x) h264_weight8_8_rvv_i32: before 1019.8 ( 0.49x) h264_weight8_8_rvv_i64: after 71.8 ( 6.95x) h264_weight16_8_c: 874.0 ( 1.00x) h264_weight16_8_rvv_i32: unchanged 249.0 ( 3.51x)pull/153/merge
parent
ba7d0d5fc3
commit
4936bb2508
2 changed files with 42 additions and 38 deletions
Loading…
Reference in new issue