As with the inter loop filter, performance metrics seem to be biased in favour of the C implementation because checkasm inputs almost always fall in the no-op case. h264_h_loop_filter_chroma_intra_8bpp_c: 82.8 ( 1.00x) h264_h_loop_filter_chroma_intra_8bpp_rvv_i32: 72.6 ( 1.14x) h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 41.1 ( 1.00x) h264_h_loop_filter_chroma_mbaff_intra_8bpp_rvv_i32: 72.6 ( 0.57x) h264_h_loop_filter_luma_intra_8bpp_c: 166.1 ( 1.00x) h264_h_loop_filter_luma_intra_8bpp_rvv_i32: 395.4 ( 0.42x) h264_h_loop_filter_luma_mbaff_intra_8bpp_c: 93.3 ( 1.00x) h264_h_loop_filter_luma_mbaff_intra_8bpp_rvv_i32: 395.4 ( 0.24x) h264_v_loop_filter_chroma_intra_8bpp_c: 134.8 ( 1.00x) h264_v_loop_filter_chroma_intra_8bpp_rvv_i32: 51.6 ( 2.61x) h264_v_loop_filter_luma_intra_8bpp_c: 468.1 ( 1.00x) h264_v_loop_filter_luma_intra_8bpp_rvv_i32: 134.8 ( 3.47x)