Martin Storsjö
|
a3ec1f8c6c
|
aarch64: h26x: Fix the indentation of one function
Signed-off-by: Martin Storsjö <martin@martin.st>
|
5 months ago |
Zhao Zhili
|
1be5a2374f
|
aarch64/vvc: Add put_epel_hv
On Apple M1:
put_chroma_hv_8_4x4_c: 1.7 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x)
put_chroma_hv_8_8x8_c: 5.5 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 (11.53x)
put_chroma_hv_8_16x16_c: 18.5 ( 1.00x)
put_chroma_hv_8_16x16_neon: 1.5 (12.53x)
put_chroma_hv_8_32x32_c: 72.5 ( 1.00x)
put_chroma_hv_8_32x32_neon: 4.7 (15.34x)
put_chroma_hv_8_64x64_c: 274.0 ( 1.00x)
put_chroma_hv_8_64x64_neon: 18.5 (14.83x)
put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x)
put_chroma_hv_8_128x128_neon: 75.2 (14.07x)
On Android Pixel 8 Pro:
put_chroma_hv_8_4x4_c: 1.2 ( 1.00x)
put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x)
put_chroma_hv_8_8x8_c: 4.0 ( 1.00x)
put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x)
put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x)
put_chroma_hv_8_16x16_c: 15.2 ( 1.00x)
put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x)
put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x)
put_chroma_hv_8_32x32_c: 61.0 ( 1.00x)
put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x)
put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x)
put_chroma_hv_8_64x64_c: 229.5 ( 1.00x)
put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x)
put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x)
put_chroma_hv_8_128x128_c: 919.8 ( 1.00x)
put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x)
put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x)
|
6 months ago |
Zhao Zhili
|
0dcf204e5d
|
aarch64/vvc: Add put_epel_h i8mm
put_chroma_h_8_4x4_c: 0.4 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.0 ( 0.00x)
put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x)
put_chroma_h_8_8x8_c: 1.6 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.1 (11.00x)
put_chroma_h_8_8x8_i8mm: 0.1 (11.00x)
put_chroma_h_8_16x16_c: 6.9 ( 1.00x)
put_chroma_h_8_16x16_neon: 1.1 ( 6.00x)
put_chroma_h_8_16x16_i8mm: 0.7 (10.62x)
put_chroma_h_8_32x32_c: 27.6 ( 1.00x)
put_chroma_h_8_32x32_neon: 4.7 ( 5.95x)
put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x)
put_chroma_h_8_64x64_c: 116.2 ( 1.00x)
put_chroma_h_8_64x64_neon: 19.1 ( 6.07x)
put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x)
put_chroma_h_8_128x128_c: 466.6 ( 1.00x)
put_chroma_h_8_128x128_neon: 81.4 ( 5.73x)
put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x)
|
6 months ago |
Zhao Zhili
|
41a1885f7a
|
aarch64/vvc: Add put_epel_h
put_chroma_h_8_4x4_c: 0.2 ( 1.00x)
put_chroma_h_8_4x4_neon: 0.2 ( 1.00x)
put_chroma_h_8_8x8_c: 0.8 ( 1.00x)
put_chroma_h_8_8x8_neon: 0.2 ( 3.00x)
put_chroma_h_8_16x16_c: 3.8 ( 1.00x)
put_chroma_h_8_16x16_neon: 0.8 ( 5.00x)
put_chroma_h_8_32x32_c: 12.5 ( 1.00x)
put_chroma_h_8_32x32_neon: 2.2 ( 5.56x)
put_chroma_h_8_64x64_c: 47.0 ( 1.00x)
put_chroma_h_8_64x64_neon: 8.8 ( 5.37x)
put_chroma_h_8_128x128_c: 200.2 ( 1.00x)
put_chroma_h_8_128x128_neon: 31.8 ( 6.31x)
|
6 months ago |
Zhao Zhili
|
5ac6925803
|
aarch64/vvc: Add put_qpel_hv
With Apple M1 (no i8mm):
put_luma_hv_8_4x4_c: 2.2 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 3.00x)
put_luma_hv_8_8x8_c: 7.0 ( 1.00x)
put_luma_hv_8_8x8_neon: 0.8 ( 9.33x)
put_luma_hv_8_16x16_c: 22.8 ( 1.00x)
put_luma_hv_8_16x16_neon: 2.5 ( 9.10x)
put_luma_hv_8_32x32_c: 84.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 9.5 ( 8.92x)
put_luma_hv_8_64x64_c: 333.0 ( 1.00x)
put_luma_hv_8_64x64_neon: 35.5 ( 9.38x)
put_luma_hv_8_128x128_c: 1294.5 ( 1.00x)
put_luma_hv_8_128x128_neon: 137.8 ( 9.40x)
With Pixel 8 Pro:
put_luma_hv_8_4x4_c: 5.0 ( 1.00x)
put_luma_hv_8_4x4_neon: 0.8 ( 6.67x)
put_luma_hv_8_4x4_i8mm: 0.2 (20.00x)
put_luma_hv_8_8x8_c: 13.2 ( 1.00x)
put_luma_hv_8_8x8_neon: 1.2 (10.60x)
put_luma_hv_8_8x8_i8mm: 1.2 (10.60x)
put_luma_hv_8_16x16_c: 44.2 ( 1.00x)
put_luma_hv_8_16x16_neon: 4.5 ( 9.83x)
put_luma_hv_8_16x16_i8mm: 4.2 (10.41x)
put_luma_hv_8_32x32_c: 160.8 ( 1.00x)
put_luma_hv_8_32x32_neon: 17.5 ( 9.19x)
put_luma_hv_8_32x32_i8mm: 16.0 (10.05x)
put_luma_hv_8_64x64_c: 611.2 ( 1.00x)
put_luma_hv_8_64x64_neon: 68.0 ( 8.99x)
put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x)
put_luma_hv_8_128x128_c: 2384.8 ( 1.00x)
put_luma_hv_8_128x128_neon: 268.8 ( 8.87x)
put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x)
|
6 months ago |
Zhao Zhili
|
a0b52afd32
|
aarch64/vvc: Add put_qpel_vx
put_luma_v_8_4x4_c: 1.0 ( 1.00x)
put_luma_v_8_4x4_neon: 0.0 ( 0.00x)
put_luma_v_8_8x8_c: 3.5 ( 1.00x)
put_luma_v_8_8x8_neon: 0.5 ( 7.00x)
put_luma_v_8_16x16_c: 13.8 ( 1.00x)
put_luma_v_8_16x16_neon: 1.2 (11.00x)
put_luma_v_8_32x32_c: 54.2 ( 1.00x)
put_luma_v_8_32x32_neon: 5.0 (10.85x)
put_luma_v_8_64x64_c: 217.5 ( 1.00x)
put_luma_v_8_64x64_neon: 18.8 (11.60x)
put_luma_v_8_128x128_c: 886.2 ( 1.00x)
put_luma_v_8_128x128_neon: 74.0 (11.98x)
|
6 months ago |
Zhao Zhili
|
b051bc7cb8
|
aarch64/h26x: Remove duplicate b.eq instruction
b.eq is added by calc_all after each calc.
|
6 months ago |
Zhao Zhili
|
9f6c8eb412
|
aarch64/vvc: Add put_qpel_hx i8mm
Benchmark on Android pixel 8 with -fno-vectorize
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x)
put_luma_h_8_8x8_c: 1.5 ( 1.00x)
put_luma_h_8_8x8_neon: 0.5 ( 3.00x)
put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x)
put_luma_h_8_16x16_c: 6.2 ( 1.00x)
put_luma_h_8_16x16_neon: 2.0 ( 3.12x)
put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x)
put_luma_h_8_32x32_c: 25.5 ( 1.00x)
put_luma_h_8_32x32_neon: 9.0 ( 2.83x)
put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x)
put_luma_h_8_64x64_c: 99.8 ( 1.00x)
put_luma_h_8_64x64_neon: 35.2 ( 2.83x)
put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x)
put_luma_h_8_128x128_c: 422.0 ( 1.00x)
put_luma_h_8_128x128_neon: 138.5 ( 3.05x)
put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x)
|
6 months ago |
Zhao Zhili
|
25448d1716
|
aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w
put_luma_pixels_8_4x4_c: 0.2 ( 1.00x)
put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x)
put_luma_pixels_8_8x8_c: 0.7 ( 1.00x)
put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x)
put_luma_pixels_8_16x16_c: 2.2 ( 1.00x)
put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x)
put_luma_pixels_8_32x32_c: 8.2 ( 1.00x)
put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x)
put_luma_pixels_8_64x64_c: 33.7 ( 1.00x)
put_luma_pixels_8_64x64_neon: 2.5 (13.63x)
put_luma_pixels_8_128x128_c: 145.5 ( 1.00x)
put_luma_pixels_8_128x128_neon: 10.2 (14.23x)
put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x)
put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x)
put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x)
put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x)
put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x)
put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x)
put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x)
put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x)
put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x)
put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x)
|
6 months ago |
Zhao Zhili
|
20f2bf5530
|
aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_*
Just share hevc implementation.
checkasm --test=vvc_mc --benchmark:
put_luma_h_8_4x4_c: 0.2 ( 1.00x)
put_luma_h_8_4x4_neon: 0.2 ( 1.00x)
put_luma_h_8_8x8_c: 1.0 ( 1.00x)
put_luma_h_8_8x8_neon: 0.2 ( 4.33x)
put_luma_h_8_16x16_c: 3.2 ( 1.00x)
put_luma_h_8_16x16_neon: 1.2 ( 2.63x)
put_luma_h_8_32x32_c: 13.7 ( 1.00x)
put_luma_h_8_32x32_neon: 4.0 ( 3.45x)
put_luma_h_8_64x64_c: 48.2 ( 1.00x)
put_luma_h_8_64x64_neon: 15.7 ( 3.07x)
put_luma_h_8_128x128_c: 203.5 ( 1.00x)
put_luma_h_8_128x128_neon: 62.0 ( 3.28x)
put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x)
put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x)
put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x)
put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x)
put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x)
put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x)
put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x)
put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x)
put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x)
put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x)
put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x)
put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x)
|
6 months ago |
Zhao Zhili
|
46f07ce7d1
|
aarch64/hevc: Move epel/qpel to h26x directory
So vvc can reuse the implementation.
|
6 months ago |
Zhao Zhili
|
4c0372281b
|
aarch64/vvc: Bind h26x/sao filter implementation to vvc
Reviewed-by: Martin Storsjö <martin@martin.st>
|
6 months ago |
Zhao Zhili
|
8cc10298a7
|
aarch64/hevc: Move sao to h26x directory
So vvc can reuse the implementation.
Reviewed-by: Martin Storsjö <martin@martin.st>
|
6 months ago |