Zhao Zhili
5988a2729b
aarch64/vvc: Add dmvr
...
dmvr_8_12x20_c: 1.5 ( 1.00x)
dmvr_8_12x20_neon: 0.2 ( 6.56x)
dmvr_8_20x12_c: 1.0 ( 1.00x)
dmvr_8_20x12_neon: 0.2 ( 4.33x)
dmvr_8_20x20_c: 1.7 ( 1.00x)
dmvr_8_20x20_neon: 0.5 ( 3.63x)
dmvr_12_12x20_c: 2.2 ( 1.00x)
dmvr_12_12x20_neon: 0.5 ( 4.68x)
dmvr_12_20x12_c: 2.0 ( 1.00x)
dmvr_12_20x12_neon: 0.5 ( 4.16x)
dmvr_12_20x20_c: 3.7 ( 1.00x)
dmvr_12_20x20_neon: 0.7 ( 5.14x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Zhao Zhili
bcd65ebd8f
aarch64/vvc: Add dmvr_hv
...
dmvr_hv_8_12x20_c: 8.0 ( 1.00x)
dmvr_hv_8_12x20_neon: 1.2 ( 6.62x)
dmvr_hv_8_20x12_c: 8.0 ( 1.00x)
dmvr_hv_8_20x12_neon: 0.9 ( 8.37x)
dmvr_hv_8_20x20_c: 12.9 ( 1.00x)
dmvr_hv_8_20x20_neon: 1.7 ( 7.62x)
dmvr_hv_10_12x20_c: 7.0 ( 1.00x)
dmvr_hv_10_12x20_neon: 1.7 ( 4.09x)
dmvr_hv_10_20x12_c: 7.0 ( 1.00x)
dmvr_hv_10_20x12_neon: 1.7 ( 4.09x)
dmvr_hv_10_20x20_c: 11.2 ( 1.00x)
dmvr_hv_10_20x20_neon: 2.7 ( 4.15x)
dmvr_hv_12_12x20_c: 6.5 ( 1.00x)
dmvr_hv_12_12x20_neon: 1.7 ( 3.79x)
dmvr_hv_12_20x12_c: 6.5 ( 1.00x)
dmvr_hv_12_20x12_neon: 1.7 ( 3.79x)
dmvr_hv_12_20x20_c: 10.2 ( 1.00x)
dmvr_hv_12_20x20_neon: 2.2 ( 4.64x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Zhao Zhili
0ba9e8d0d4
aarch64/vvc: Add w_avg
...
w_avg_8_2x2_c: 0.0 ( 0.00x)
w_avg_8_2x2_neon: 0.0 ( 0.00x)
w_avg_8_4x4_c: 0.2 ( 1.00x)
w_avg_8_4x4_neon: 0.0 ( 0.00x)
w_avg_8_8x8_c: 1.2 ( 1.00x)
w_avg_8_8x8_neon: 0.2 ( 5.00x)
w_avg_8_16x16_c: 4.2 ( 1.00x)
w_avg_8_16x16_neon: 0.8 ( 5.67x)
w_avg_8_32x32_c: 16.2 ( 1.00x)
w_avg_8_32x32_neon: 2.5 ( 6.50x)
w_avg_8_64x64_c: 64.5 ( 1.00x)
w_avg_8_64x64_neon: 9.0 ( 7.17x)
w_avg_8_128x128_c: 269.5 ( 1.00x)
w_avg_8_128x128_neon: 35.5 ( 7.59x)
w_avg_10_2x2_c: 0.2 ( 1.00x)
w_avg_10_2x2_neon: 0.2 ( 1.00x)
w_avg_10_4x4_c: 0.2 ( 1.00x)
w_avg_10_4x4_neon: 0.2 ( 1.00x)
w_avg_10_8x8_c: 1.0 ( 1.00x)
w_avg_10_8x8_neon: 0.2 ( 4.00x)
w_avg_10_16x16_c: 4.2 ( 1.00x)
w_avg_10_16x16_neon: 0.8 ( 5.67x)
w_avg_10_32x32_c: 16.2 ( 1.00x)
w_avg_10_32x32_neon: 2.5 ( 6.50x)
w_avg_10_64x64_c: 66.2 ( 1.00x)
w_avg_10_64x64_neon: 10.0 ( 6.62x)
w_avg_10_128x128_c: 277.8 ( 1.00x)
w_avg_10_128x128_neon: 39.8 ( 6.99x)
w_avg_12_2x2_c: 0.0 ( 0.00x)
w_avg_12_2x2_neon: 0.2 ( 0.00x)
w_avg_12_4x4_c: 0.2 ( 1.00x)
w_avg_12_4x4_neon: 0.0 ( 0.00x)
w_avg_12_8x8_c: 1.2 ( 1.00x)
w_avg_12_8x8_neon: 0.5 ( 2.50x)
w_avg_12_16x16_c: 4.8 ( 1.00x)
w_avg_12_16x16_neon: 0.8 ( 6.33x)
w_avg_12_32x32_c: 17.0 ( 1.00x)
w_avg_12_32x32_neon: 2.8 ( 6.18x)
w_avg_12_64x64_c: 64.0 ( 1.00x)
w_avg_12_64x64_neon: 10.0 ( 6.40x)
w_avg_12_128x128_c: 269.2 ( 1.00x)
w_avg_12_128x128_neon: 42.0 ( 6.41x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Zhao Zhili
3f84d1d1fb
aarch64/vvc: Add avg
...
avg_8_2x2_c: 0.2 ( 1.00x)
avg_8_2x2_neon: 0.2 ( 1.00x)
avg_8_4x4_c: 0.2 ( 1.00x)
avg_8_4x4_neon: 0.2 ( 1.00x)
avg_8_8x8_c: 0.9 ( 1.00x)
avg_8_8x8_neon: 0.2 ( 5.29x)
avg_8_16x16_c: 3.7 ( 1.00x)
avg_8_16x16_neon: 0.7 ( 5.44x)
avg_8_32x32_c: 14.9 ( 1.00x)
avg_8_32x32_neon: 1.7 ( 8.91x)
avg_8_64x64_c: 59.7 ( 1.00x)
avg_8_64x64_neon: 6.9 ( 8.62x)
avg_8_128x128_c: 254.7 ( 1.00x)
avg_8_128x128_neon: 26.9 ( 9.46x)
avg_10_2x2_c: 0.2 ( 1.00x)
avg_10_2x2_neon: 0.2 ( 1.00x)
avg_10_4x4_c: 0.2 ( 1.00x)
avg_10_4x4_neon: 0.2 ( 1.00x)
avg_10_8x8_c: 0.9 ( 1.00x)
avg_10_8x8_neon: 0.2 ( 5.29x)
avg_10_16x16_c: 3.4 ( 1.00x)
avg_10_16x16_neon: 0.4 ( 8.06x)
avg_10_32x32_c: 13.9 ( 1.00x)
avg_10_32x32_neon: 1.9 ( 7.23x)
avg_10_64x64_c: 54.2 ( 1.00x)
avg_10_64x64_neon: 8.4 ( 6.43x)
avg_10_128x128_c: 232.4 ( 1.00x)
avg_10_128x128_neon: 30.9 ( 7.52x)
avg_12_2x2_c: 0.0 ( 0.00x)
avg_12_2x2_neon: 0.2 ( 0.00x)
avg_12_4x4_c: 0.4 ( 1.00x)
avg_12_4x4_neon: 0.2 ( 2.43x)
avg_12_8x8_c: 0.7 ( 1.00x)
avg_12_8x8_neon: 0.2 ( 3.86x)
avg_12_16x16_c: 3.7 ( 1.00x)
avg_12_16x16_neon: 0.4 ( 8.65x)
avg_12_32x32_c: 13.7 ( 1.00x)
avg_12_32x32_neon: 2.2 ( 6.29x)
avg_12_64x64_c: 53.9 ( 1.00x)
avg_12_64x64_neon: 7.7 ( 7.03x)
avg_12_128x128_c: 270.9 ( 1.00x)
avg_12_128x128_neon: 30.4 ( 8.90x)
5 months ago
Zhao Zhili
1be5a2374f
aarch64/vvc: Add put_epel_hv
...
On Apple M1:
put_chroma_hv_8_4x4_c: 1.7 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x)
put_chroma_hv_8_8x8_c: 5.5 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 (11.53x)
put_chroma_hv_8_16x16_c: 18.5 ( 1.00x)
put_chroma_hv_8_16x16_neon: 1.5 (12.53x)
put_chroma_hv_8_32x32_c: 72.5 ( 1.00x)
put_chroma_hv_8_32x32_neon: 4.7 (15.34x)
put_chroma_hv_8_64x64_c: 274.0 ( 1.00x)
put_chroma_hv_8_64x64_neon: 18.5 (14.83x)
put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x)
put_chroma_hv_8_128x128_neon: 75.2 (14.07x)
On Android Pixel 8 Pro:
put_chroma_hv_8_4x4_c: 1.2 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x)
put_chroma_hv_8_8x8_c: 4.0 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x)
put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x)
put_chroma_hv_8_16x16_c: 15.2 ( 1.00x)
put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x)
put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x)
put_chroma_hv_8_32x32_c: 61.0 ( 1.00x)
put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x)
put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x)
put_chroma_hv_8_64x64_c: 229.5 ( 1.00x)
put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x)
put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x)
put_chroma_hv_8_128x128_c: 919.8 ( 1.00x)
put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x)
put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x)
5 months ago
Zhao Zhili
0dcf204e5d
aarch64/vvc: Add put_epel_h i8mm
...
put_chroma_h_8_4x4_c: 0.4 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x)
put_chroma_h_8_8x8_c: 1.6 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.1 (11.00x)
put_chroma_h_8_8x8_i8mm: 0.1 (11.00x)
put_chroma_h_8_16x16_c: 6.9 ( 1.00x)
put_chroma_h_8_16x16_neon: 1.1 ( 6.00x)
put_chroma_h_8_16x16_i8mm: 0.7 (10.62x)
put_chroma_h_8_32x32_c: 27.6 ( 1.00x)
put_chroma_h_8_32x32_neon: 4.7 ( 5.95x)
put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x)
put_chroma_h_8_64x64_c: 116.2 ( 1.00x)
put_chroma_h_8_64x64_neon: 19.1 ( 6.07x)
put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x)
put_chroma_h_8_128x128_c: 466.6 ( 1.00x)
put_chroma_h_8_128x128_neon: 81.4 ( 5.73x)
put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x)
5 months ago
Zhao Zhili
41a1885f7a
aarch64/vvc: Add put_epel_h
...
put_chroma_h_8_4x4_c: 0.2 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.2 ( 1.00x)
put_chroma_h_8_8x8_c: 0.8 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.2 ( 3.00x)
put_chroma_h_8_16x16_c: 3.8 ( 1.00x)
put_chroma_h_8_16x16_neon: 0.8 ( 5.00x)
put_chroma_h_8_32x32_c: 12.5 ( 1.00x)
put_chroma_h_8_32x32_neon: 2.2 ( 5.56x)
put_chroma_h_8_64x64_c: 47.0 ( 1.00x)
put_chroma_h_8_64x64_neon: 8.8 ( 5.37x)
put_chroma_h_8_128x128_c: 200.2 ( 1.00x)
put_chroma_h_8_128x128_neon: 31.8 ( 6.31x)
5 months ago
Zhao Zhili
260e1b4b62
aarch64/vvc: Add sad
...
sad_8x16_c: 0.8 ( 1.00x)
sad_8x16_neon: 0.2 ( 3.00x)
sad_16x8_c: 0.5 ( 1.00x)
sad_16x8_neon: 0.2 ( 2.00x)
sad_16x16_c: 1.5 ( 1.00x)
sad_16x16_neon: 0.2 ( 6.00x)
5 months ago
Zhao Zhili
5ac6925803
aarch64/vvc: Add put_qpel_hv
...
With Apple M1 (no i8mm):
put_luma_hv_8_4x4_c: 2.2 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 3.00x)
put_luma_hv_8_8x8_c: 7.0 ( 1.00x)
put_luma_hv_8_8x8_neon: 0.8 ( 9.33x)
put_luma_hv_8_16x16_c: 22.8 ( 1.00x)
put_luma_hv_8_16x16_neon: 2.5 ( 9.10x)
put_luma_hv_8_32x32_c: 84.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 9.5 ( 8.92x)
put_luma_hv_8_64x64_c: 333.0 ( 1.00x)
put_luma_hv_8_64x64_neon: 35.5 ( 9.38x)
put_luma_hv_8_128x128_c: 1294.5 ( 1.00x)
put_luma_hv_8_128x128_neon: 137.8 ( 9.40x)
With Pixel 8 Pro:
put_luma_hv_8_4x4_c: 5.0 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 6.67x)
put_luma_hv_8_4x4_i8mm: 0.2 (20.00x)
put_luma_hv_8_8x8_c: 13.2 ( 1.00x)
put_luma_hv_8_8x8_neon: 1.2 (10.60x)
put_luma_hv_8_8x8_i8mm: 1.2 (10.60x)
put_luma_hv_8_16x16_c: 44.2 ( 1.00x)
put_luma_hv_8_16x16_neon: 4.5 ( 9.83x)
put_luma_hv_8_16x16_i8mm: 4.2 (10.41x)
put_luma_hv_8_32x32_c: 160.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 17.5 ( 9.19x)
put_luma_hv_8_32x32_i8mm: 16.0 (10.05x)
put_luma_hv_8_64x64_c: 611.2 ( 1.00x)
put_luma_hv_8_64x64_neon: 68.0 ( 8.99x)
put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x)
put_luma_hv_8_128x128_c: 2384.8 ( 1.00x)
put_luma_hv_8_128x128_neon: 268.8 ( 8.87x)
put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x)
5 months ago
Zhao Zhili
a0b52afd32
aarch64/vvc: Add put_qpel_vx
...
put_luma_v_8_4x4_c: 1.0 ( 1.00x)
put_luma_v_8_4x4_neon: 0.0 ( 0.00x)
put_luma_v_8_8x8_c: 3.5 ( 1.00x)
put_luma_v_8_8x8_neon: 0.5 ( 7.00x)
put_luma_v_8_16x16_c: 13.8 ( 1.00x)
put_luma_v_8_16x16_neon: 1.2 (11.00x)
put_luma_v_8_32x32_c: 54.2 ( 1.00x)
put_luma_v_8_32x32_neon: 5.0 (10.85x)
put_luma_v_8_64x64_c: 217.5 ( 1.00x)
put_luma_v_8_64x64_neon: 18.8 (11.60x)
put_luma_v_8_128x128_c: 886.2 ( 1.00x)
put_luma_v_8_128x128_neon: 74.0 (11.98x)
5 months ago
Zhao Zhili
9f6c8eb412
aarch64/vvc: Add put_qpel_hx i8mm
...
Benchmark on Android pixel 8 with -fno-vectorize
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x)
put_luma_h_8_8x8_c: 1.5 ( 1.00x)
put_luma_h_8_8x8_neon: 0.5 ( 3.00x)
put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x)
put_luma_h_8_16x16_c: 6.2 ( 1.00x)
put_luma_h_8_16x16_neon: 2.0 ( 3.12x)
put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x)
put_luma_h_8_32x32_c: 25.5 ( 1.00x)
put_luma_h_8_32x32_neon: 9.0 ( 2.83x)
put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x)
put_luma_h_8_64x64_c: 99.8 ( 1.00x)
put_luma_h_8_64x64_neon: 35.2 ( 2.83x)
put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x)
put_luma_h_8_128x128_c: 422.0 ( 1.00x)
put_luma_h_8_128x128_neon: 138.5 ( 3.05x)
put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x)
5 months ago
Zhao Zhili
25448d1716
aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w
...
put_luma_pixels_8_4x4_c: 0.2 ( 1.00x)
put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x)
put_luma_pixels_8_8x8_c: 0.7 ( 1.00x)
put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x)
put_luma_pixels_8_16x16_c: 2.2 ( 1.00x)
put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x)
put_luma_pixels_8_32x32_c: 8.2 ( 1.00x)
put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x)
put_luma_pixels_8_64x64_c: 33.7 ( 1.00x)
put_luma_pixels_8_64x64_neon: 2.5 (13.63x)
put_luma_pixels_8_128x128_c: 145.5 ( 1.00x)
put_luma_pixels_8_128x128_neon: 10.2 (14.23x)
put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x)
put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x)
put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x)
put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x)
put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x)
put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x)
put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x)
put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x)
5 months ago
Zhao Zhili
20f2bf5530
aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_*
...
Just share hevc implementation.
checkasm --test=vvc_mc --benchmark:
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_8x8_c: 1.0 ( 1.00x)
put_luma_h_8_8x8_neon: 0.2 ( 4.33x)
put_luma_h_8_16x16_c: 3.2 ( 1.00x)
put_luma_h_8_16x16_neon: 1.2 ( 2.63x)
put_luma_h_8_32x32_c: 13.7 ( 1.00x)
put_luma_h_8_32x32_neon: 4.0 ( 3.45x)
put_luma_h_8_64x64_c: 48.2 ( 1.00x)
put_luma_h_8_64x64_neon: 15.7 ( 3.07x)
put_luma_h_8_128x128_c: 203.5 ( 1.00x)
put_luma_h_8_128x128_neon: 62.0 ( 3.28x)
put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x)
put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x)
put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x)
put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x)
put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x)
put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x)
put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x)
put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x)
put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x)
put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x)
put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x)
put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x)
5 months ago
Zhao Zhili
4c0372281b
aarch64/vvc: Bind h26x/sao filter implementation to vvc
...
Reviewed-by: Martin Storsjö <martin@martin.st>
5 months ago
Zhao Zhili
2d4ef304c9
avcodec/vvc: Add aarch64 neon optimization for ALF
...
vvc_alf_filter_chroma_4x4_8_c: 3.0
vvc_alf_filter_chroma_4x4_8_neon: 1.0
vvc_alf_filter_chroma_4x4_10_c: 2.7
vvc_alf_filter_chroma_4x4_10_neon: 1.0
vvc_alf_filter_chroma_4x4_12_c: 2.7
vvc_alf_filter_chroma_4x4_12_neon: 1.0
vvc_alf_filter_chroma_8x8_8_c: 10.2
vvc_alf_filter_chroma_8x8_8_neon: 3.0
vvc_alf_filter_chroma_8x8_10_c: 10.0
vvc_alf_filter_chroma_8x8_10_neon: 2.5
vvc_alf_filter_chroma_8x8_12_c: 10.0
vvc_alf_filter_chroma_8x8_12_neon: 2.5
vvc_alf_filter_chroma_16x16_8_c: 41.7
vvc_alf_filter_chroma_16x16_8_neon: 11.2
vvc_alf_filter_chroma_16x16_10_c: 39.0
vvc_alf_filter_chroma_16x16_10_neon: 10.0
vvc_alf_filter_chroma_16x16_12_c: 40.2
vvc_alf_filter_chroma_16x16_12_neon: 10.2
vvc_alf_filter_chroma_32x32_8_c: 162.0
vvc_alf_filter_chroma_32x32_8_neon: 45.0
vvc_alf_filter_chroma_32x32_10_c: 155.5
vvc_alf_filter_chroma_32x32_10_neon: 39.5
vvc_alf_filter_chroma_32x32_12_c: 155.5
vvc_alf_filter_chroma_32x32_12_neon: 40.0
vvc_alf_filter_chroma_64x64_8_c: 646.0
vvc_alf_filter_chroma_64x64_8_neon: 175.5
vvc_alf_filter_chroma_64x64_10_c: 708.2
vvc_alf_filter_chroma_64x64_10_neon: 166.7
vvc_alf_filter_chroma_64x64_12_c: 619.2
vvc_alf_filter_chroma_64x64_12_neon: 157.2
vvc_alf_filter_chroma_128x128_8_c: 2611.5
vvc_alf_filter_chroma_128x128_8_neon: 698.2
vvc_alf_filter_chroma_128x128_10_c: 2470.0
vvc_alf_filter_chroma_128x128_10_neon: 616.0
vvc_alf_filter_chroma_128x128_12_c: 2531.5
vvc_alf_filter_chroma_128x128_12_neon: 620.2
vvc_alf_filter_luma_8x8_8_c: 25.2
vvc_alf_filter_luma_8x8_8_neon: 4.2
vvc_alf_filter_luma_8x8_10_c: 18.5
vvc_alf_filter_luma_8x8_10_neon: 4.0
vvc_alf_filter_luma_8x8_12_c: 19.0
vvc_alf_filter_luma_8x8_12_neon: 4.0
vvc_alf_filter_luma_16x16_8_c: 106.5
vvc_alf_filter_luma_16x16_8_neon: 16.2
vvc_alf_filter_luma_16x16_10_c: 75.2
vvc_alf_filter_luma_16x16_10_neon: 14.7
vvc_alf_filter_luma_16x16_12_c: 79.7
vvc_alf_filter_luma_16x16_12_neon: 14.7
vvc_alf_filter_luma_32x32_8_c: 400.5
vvc_alf_filter_luma_32x32_8_neon: 63.2
vvc_alf_filter_luma_32x32_10_c: 299.2
vvc_alf_filter_luma_32x32_10_neon: 57.7
vvc_alf_filter_luma_32x32_12_c: 299.2
vvc_alf_filter_luma_32x32_12_neon: 57.7
vvc_alf_filter_luma_64x64_8_c: 1602.5
vvc_alf_filter_luma_64x64_8_neon: 251.7
vvc_alf_filter_luma_64x64_10_c: 1197.0
vvc_alf_filter_luma_64x64_10_neon: 235.5
vvc_alf_filter_luma_64x64_12_c: 1220.2
vvc_alf_filter_luma_64x64_12_neon: 235.7
vvc_alf_filter_luma_128x128_8_c: 6570.2
vvc_alf_filter_luma_128x128_8_neon: 1007.7
vvc_alf_filter_luma_128x128_10_c: 4822.7
vvc_alf_filter_luma_128x128_10_neon: 936.2
vvc_alf_filter_luma_128x128_12_c: 4791.2
vvc_alf_filter_luma_128x128_12_neon: 938.5
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
6 months ago
Diego Biurrun
6a44304074
dsputil: Move ff_h264_idct function declarations to a separate header
12 years ago
Michael Niedermayer
69d5e40e5a
h264idct: 12 and 14 bit support
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Oskar Arvidsson
19a0729b4c
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Oskar Arvidsson
15fb393be6
Move the functions in h264idct into a new file h264idct_template.c.
...
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Oskar Arvidsson
e39e3abad4
Choose h264 chroma dc dequant function dynamically.
...
Needed for high bit depth h264 decoding.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Oskar Arvidsson
8dbe585641
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Oskar Arvidsson
7bc8032b07
Move the functions in h264idct into a new file h264idct_internal.h.
...
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Oskar Arvidsson
af0b2d6736
Choose h264 chroma dc dequant function dynamically.
...
Needed for high bit depth h264 decoding.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Ronald S. Bultje
772225c041
Revert 2a1f431d38
, it broke H264 lossless.
...
(cherry picked from commit 66c6b5e2a5
)
14 years ago
Ronald S. Bultje
66c6b5e2a5
Revert 2a1f431d38
, it broke H264 lossless.
14 years ago
Jason Garrett-Glaser
2a1f431d38
H.264/SVQ3: make chroma DC work the same way as luma DC
...
No speed improvement, but necessary for some future stuff.
Also opens up the possibility of asm chroma dc idct/dequant.
Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Jason Garrett-Glaser
19fb234e4a
H.264: split luma dc idct out and implement MMX/SSE2 versions
...
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Jason Garrett-Glaser
ca32f7f208
H.264: eliminate non-transposed scantable support.
...
It was an ugly hack to begin with and didn't give any performance.
NOTE: this patch opens up some future simplifications to be made (such as
removing some of the scantables from H264Context) but doesn't take advantage
of them yet.
Originally committed as revision 26329 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Måns Rullgård
8fbd4f51a8
Improve some uses of ff_cropTbl with constant offset
...
Originally committed as revision 23728 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Diego Biurrun
ba87f0801d
Remove explicit filename from Doxygen @file commands.
...
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Diego Biurrun
bad5537e2c
Use full internal pathname in doxygen @file directives.
...
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk
16 years ago
Diego Biurrun
99ed41a808
Fix filenames in Doxygen comments.
...
Originally committed as revision 16811 to svn://svn.ffmpeg.org/ffmpeg/trunk
16 years ago
Loren Merritt
1cca8d2423
flatten an array, since gcc fails at optimizing multidimensional arrays
...
h264_idct8_add_c: 780 -> 735 cycles on conroe
Originally committed as revision 16307 to svn://svn.ffmpeg.org/ffmpeg/trunk
16 years ago
Michael Niedermayer
ac22385931
H.264 idct functions that include the chroma, inter luma and intra16 luma loops
...
thus avoiding the calling overhead.
New functions are not yet used.
Originally committed as revision 16206 to svn://svn.ffmpeg.org/ffmpeg/trunk
16 years ago
Diego Biurrun
e5a389a1b7
license header consistency cosmetics
...
Originally committed as revision 9484 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
849f10351d
rename always_inline to av_always_inline and move to common.h
...
Originally committed as revision 7256 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
55fde95e3b
rename cropTbl -> ff_cropTbl
...
Originally committed as revision 6992 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
b78e7197a8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
...
and fix GPL/LGPL version mismatches.
Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
19 years ago
Loren Merritt
ef9d1d1575
h264: special case dc-only idct. ~1% faster overall
...
Originally committed as revision 4971 to svn://svn.ffmpeg.org/ffmpeg/trunk
19 years ago
Diego Biurrun
5509bffa88
Update licensing information: The FSF changed postal address.
...
Originally committed as revision 4842 to svn://svn.ffmpeg.org/ffmpeg/trunk
19 years ago
Diego Biurrun
115329f160
COSMETICS: Remove all trailing whitespace.
...
Originally committed as revision 4749 to svn://svn.ffmpeg.org/ffmpeg/trunk
19 years ago
Loren Merritt
43efd19a88
decode H.264 with 8x8 transform.
...
deblocking is still incorrect with 8x8+cavlc
Originally committed as revision 4339 to svn://svn.ffmpeg.org/ffmpeg/trunk
20 years ago
Michael Niedermayer
0fa8158d3e
move h264 idct to its own file and call via function pointer in DspContext
...
allow h264 idct to be used for lowres=1
Originally committed as revision 3524 to svn://svn.ffmpeg.org/ffmpeg/trunk
21 years ago