Martin Storsjö
1762975ba1
libavcodec/aarch64/hevc: Require consistent use of trailing semicolon
...
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
8fa83ad70f
lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_hv
...
checkasm bench:
put_hevc_qpel_uni_hv4_8_c: 489.2
put_hevc_qpel_uni_hv4_8_i8mm: 105.7
put_hevc_qpel_uni_hv6_8_c: 852.7
put_hevc_qpel_uni_hv6_8_i8mm: 268.7
put_hevc_qpel_uni_hv8_8_c: 1345.7
put_hevc_qpel_uni_hv8_8_i8mm: 300.4
put_hevc_qpel_uni_hv12_8_c: 2757.4
put_hevc_qpel_uni_hv12_8_i8mm: 581.4
put_hevc_qpel_uni_hv16_8_c: 4458.9
put_hevc_qpel_uni_hv16_8_i8mm: 860.2
put_hevc_qpel_uni_hv24_8_c: 9582.2
put_hevc_qpel_uni_hv24_8_i8mm: 2086.7
put_hevc_qpel_uni_hv32_8_c: 16401.9
put_hevc_qpel_uni_hv32_8_i8mm: 3217.4
put_hevc_qpel_uni_hv48_8_c: 36402.4
put_hevc_qpel_uni_hv48_8_i8mm: 7082.7
put_hevc_qpel_uni_hv64_8_c: 62713.2
put_hevc_qpel_uni_hv64_8_i8mm: 12408.9
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
23ca61b7de
lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_v
...
checkasm bench:
put_hevc_qpel_uni_v4_8_c: 146.2
put_hevc_qpel_uni_v4_8_neon: 43.2
put_hevc_qpel_uni_v6_8_c: 303.9
put_hevc_qpel_uni_v6_8_neon: 69.7
put_hevc_qpel_uni_v8_8_c: 495.2
put_hevc_qpel_uni_v8_8_neon: 74.7
put_hevc_qpel_uni_v12_8_c: 1100.9
put_hevc_qpel_uni_v12_8_neon: 222.4
put_hevc_qpel_uni_v16_8_c: 1955.2
put_hevc_qpel_uni_v16_8_neon: 269.2
put_hevc_qpel_uni_v24_8_c: 4571.9
put_hevc_qpel_uni_v24_8_neon: 832.4
put_hevc_qpel_uni_v32_8_c: 8226.4
put_hevc_qpel_uni_v32_8_neon: 1035.7
put_hevc_qpel_uni_v48_8_c: 18324.2
put_hevc_qpel_uni_v48_8_neon: 2321.2
put_hevc_qpel_uni_v64_8_c: 37659.4
put_hevc_qpel_uni_v64_8_neon: 4122.2
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
b7a3150bc5
lavc/aarch64: new optimization for 8-bit hevc_epel_uni_hv
...
checkasm bench:
put_hevc_epel_uni_hv4_8_c: 204.7
put_hevc_epel_uni_hv4_8_i8mm: 70.2
put_hevc_epel_uni_hv6_8_c: 378.2
put_hevc_epel_uni_hv6_8_i8mm: 131.9
put_hevc_epel_uni_hv8_8_c: 637.7
put_hevc_epel_uni_hv8_8_i8mm: 137.9
put_hevc_epel_uni_hv12_8_c: 1301.9
put_hevc_epel_uni_hv12_8_i8mm: 314.2
put_hevc_epel_uni_hv16_8_c: 2203.4
put_hevc_epel_uni_hv16_8_i8mm: 454.7
put_hevc_epel_uni_hv24_8_c: 4848.2
put_hevc_epel_uni_hv24_8_i8mm: 1065.2
put_hevc_epel_uni_hv32_8_c: 8517.4
put_hevc_epel_uni_hv32_8_i8mm: 1898.4
put_hevc_epel_uni_hv48_8_c: 19591.7
put_hevc_epel_uni_hv48_8_i8mm: 4107.2
put_hevc_epel_uni_hv64_8_c: 33880.2
put_hevc_epel_uni_hv64_8_i8mm: 6568.7
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
7ce5a2f640
lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v
...
checkasm bench:
put_hevc_epel_uni_hv64_8_i8mm: 6568.7
put_hevc_epel_uni_v4_8_c: 88.7
put_hevc_epel_uni_v4_8_neon: 32.7
put_hevc_epel_uni_v6_8_c: 185.4
put_hevc_epel_uni_v6_8_neon: 44.9
put_hevc_epel_uni_v8_8_c: 333.9
put_hevc_epel_uni_v8_8_neon: 44.4
put_hevc_epel_uni_v12_8_c: 728.7
put_hevc_epel_uni_v12_8_neon: 119.7
put_hevc_epel_uni_v16_8_c: 1224.2
put_hevc_epel_uni_v16_8_neon: 139.7
put_hevc_epel_uni_v24_8_c: 2531.2
put_hevc_epel_uni_v24_8_neon: 329.9
put_hevc_epel_uni_v32_8_c: 4739.9
put_hevc_epel_uni_v32_8_neon: 562.7
put_hevc_epel_uni_v48_8_c: 10618.7
put_hevc_epel_uni_v48_8_neon: 1256.2
put_hevc_epel_uni_v64_8_c: 19169.9
put_hevc_epel_uni_v64_8_neon: 2179.2
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
9557bf26b3
lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv
...
put_hevc_epel_uni_w_hv4_8_c: 254.6
put_hevc_epel_uni_w_hv4_8_i8mm: 102.9
put_hevc_epel_uni_w_hv6_8_c: 411.6
put_hevc_epel_uni_w_hv6_8_i8mm: 221.6
put_hevc_epel_uni_w_hv8_8_c: 669.4
put_hevc_epel_uni_w_hv8_8_i8mm: 214.9
put_hevc_epel_uni_w_hv12_8_c: 1412.6
put_hevc_epel_uni_w_hv12_8_i8mm: 481.4
put_hevc_epel_uni_w_hv16_8_c: 2425.4
put_hevc_epel_uni_w_hv16_8_i8mm: 647.4
put_hevc_epel_uni_w_hv24_8_c: 5384.1
put_hevc_epel_uni_w_hv24_8_i8mm: 1450.6
put_hevc_epel_uni_w_hv32_8_c: 9470.9
put_hevc_epel_uni_w_hv32_8_i8mm: 2497.1
put_hevc_epel_uni_w_hv48_8_c: 20930.1
put_hevc_epel_uni_w_hv48_8_i8mm: 5635.9
put_hevc_epel_uni_w_hv64_8_c: 36682.9
put_hevc_epel_uni_w_hv64_8_i8mm: 9712.6
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
d48c89701c
lavc/aarch64: new optimization for 8-bit hevc_epel_h
...
put_hevc_epel_h4_8_c: 67.1
put_hevc_epel_h4_8_i8mm: 21.1
put_hevc_epel_h6_8_c: 147.1
put_hevc_epel_h6_8_i8mm: 45.1
put_hevc_epel_h8_8_c: 237.4
put_hevc_epel_h8_8_i8mm: 72.1
put_hevc_epel_h12_8_c: 527.4
put_hevc_epel_h12_8_i8mm: 115.4
put_hevc_epel_h16_8_c: 943.6
put_hevc_epel_h16_8_i8mm: 153.9
put_hevc_epel_h24_8_c: 2105.4
put_hevc_epel_h24_8_i8mm: 384.4
put_hevc_epel_h32_8_c: 3631.4
put_hevc_epel_h32_8_i8mm: 519.9
put_hevc_epel_h48_8_c: 8082.1
put_hevc_epel_h48_8_i8mm: 1110.4
put_hevc_epel_h64_8_c: 14400.6
put_hevc_epel_h64_8_i8mm: 2057.1
put_hevc_qpel_h4_8_c: 124.9
put_hevc_qpel_h4_8_neon: 43.1
put_hevc_qpel_h4_8_i8mm: 33.1
put_hevc_qpel_h6_8_c: 269.4
put_hevc_qpel_h6_8_neon: 90.6
put_hevc_qpel_h6_8_i8mm: 61.4
put_hevc_qpel_h8_8_c: 477.6
put_hevc_qpel_h8_8_neon: 82.1
put_hevc_qpel_h8_8_i8mm: 99.9
put_hevc_qpel_h12_8_c: 1062.4
put_hevc_qpel_h12_8_neon: 226.9
put_hevc_qpel_h12_8_i8mm: 170.9
put_hevc_qpel_h16_8_c: 1880.6
put_hevc_qpel_h16_8_neon: 302.9
put_hevc_qpel_h16_8_i8mm: 251.4
put_hevc_qpel_h24_8_c: 4221.9
put_hevc_qpel_h24_8_neon: 893.9
put_hevc_qpel_h24_8_i8mm: 626.1
put_hevc_qpel_h32_8_c: 7437.6
put_hevc_qpel_h32_8_neon: 1189.9
put_hevc_qpel_h32_8_i8mm: 959.1
put_hevc_qpel_h48_8_c: 16838.4
put_hevc_qpel_h48_8_neon: 2727.9
put_hevc_qpel_h48_8_i8mm: 2163.9
put_hevc_qpel_h64_8_c: 29982.1
put_hevc_qpel_h64_8_neon: 4777.6
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
668eb4c00e
lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v
...
put_hevc_epel_uni_w_v4_8_c: 116.1
put_hevc_epel_uni_w_v4_8_neon: 48.6
put_hevc_epel_uni_w_v6_8_c: 248.9
put_hevc_epel_uni_w_v6_8_neon: 80.6
put_hevc_epel_uni_w_v8_8_c: 383.9
put_hevc_epel_uni_w_v8_8_neon: 91.9
put_hevc_epel_uni_w_v12_8_c: 806.1
put_hevc_epel_uni_w_v12_8_neon: 202.9
put_hevc_epel_uni_w_v16_8_c: 1411.1
put_hevc_epel_uni_w_v16_8_neon: 289.9
put_hevc_epel_uni_w_v24_8_c: 3168.9
put_hevc_epel_uni_w_v24_8_neon: 619.4
put_hevc_epel_uni_w_v32_8_c: 5632.9
put_hevc_epel_uni_w_v32_8_neon: 1161.1
put_hevc_epel_uni_w_v48_8_c: 12406.1
put_hevc_epel_uni_w_v48_8_neon: 2476.4
put_hevc_epel_uni_w_v64_8_c: 22001.4
put_hevc_epel_uni_w_v64_8_neon: 4343.9
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
0c604b1913
lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h
...
put_hevc_epel_uni_w_h4_8_c: 126.1
put_hevc_epel_uni_w_h4_8_i8mm: 41.6
put_hevc_epel_uni_w_h6_8_c: 222.9
put_hevc_epel_uni_w_h6_8_i8mm: 91.4
put_hevc_epel_uni_w_h8_8_c: 374.4
put_hevc_epel_uni_w_h8_8_i8mm: 102.1
put_hevc_epel_uni_w_h12_8_c: 806.1
put_hevc_epel_uni_w_h12_8_i8mm: 225.6
put_hevc_epel_uni_w_h16_8_c: 1414.4
put_hevc_epel_uni_w_h16_8_i8mm: 333.4
put_hevc_epel_uni_w_h24_8_c: 3128.6
put_hevc_epel_uni_w_h24_8_i8mm: 713.1
put_hevc_epel_uni_w_h32_8_c: 5519.1
put_hevc_epel_uni_w_h32_8_i8mm: 1118.1
put_hevc_epel_uni_w_h48_8_c: 12364.4
put_hevc_epel_uni_w_h48_8_i8mm: 2541.1
put_hevc_epel_uni_w_h64_8_c: 21925.9
put_hevc_epel_uni_w_h64_8_i8mm: 4383.6
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
e652e7dcda
lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels
...
put_hevc_pel_uni_pixels4_8_c: 35.9
put_hevc_pel_uni_pixels4_8_neon: 7.6
put_hevc_pel_uni_pixels6_8_c: 46.1
put_hevc_pel_uni_pixels6_8_neon: 20.6
put_hevc_pel_uni_pixels8_8_c: 53.4
put_hevc_pel_uni_pixels8_8_neon: 11.6
put_hevc_pel_uni_pixels12_8_c: 89.1
put_hevc_pel_uni_pixels12_8_neon: 25.9
put_hevc_pel_uni_pixels16_8_c: 106.4
put_hevc_pel_uni_pixels16_8_neon: 20.4
put_hevc_pel_uni_pixels24_8_c: 137.6
put_hevc_pel_uni_pixels24_8_neon: 47.1
put_hevc_pel_uni_pixels32_8_c: 173.6
put_hevc_pel_uni_pixels32_8_neon: 54.1
put_hevc_pel_uni_pixels48_8_c: 268.1
put_hevc_pel_uni_pixels48_8_neon: 117.1
put_hevc_pel_uni_pixels64_8_c: 346.1
put_hevc_pel_uni_pixels64_8_neon: 205.9
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
e79686be96
lavc/aarch64: new optimization for 8-bit hevc_qpel_h hevc_qpel_uni_w_hv
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
15972cce8c
lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_w_h
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Logan Lyu
0b7356c1b4
lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels and qpel_uni_w_v
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
xufuji456
bd2f00f665
codec/aarch64/hevc: add transform_luma_neon
...
got 56% speed up (run_count=1000, CPU=Cortex A53)
transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103
Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
xufuji456
00a062b8d5
codec/aarch64/hevc:add idct_32x32_neon
...
got 73% speed up (run_count=1000, CPU=Cortex A53)
idct_32x32_neon: 4826 idct_32x32_c: 18236
idct_32x32_neon: 4824 idct_32x32_c: 18149
idct_32x32_neon: 4937 idct_32x32_c: 18333
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
J. Dekker
b564ad8eac
lavc/aarch64: add hevc deblock chroma 8-12bit
...
Benched on Ampere Altra:
hevc_h_loop_filter_chroma8_c: 367.7
hevc_h_loop_filter_chroma8_neon: 31.0
hevc_h_loop_filter_chroma10_c: 396.7
hevc_h_loop_filter_chroma10_neon: 27.5
hevc_h_loop_filter_chroma12_c: 377.0
hevc_h_loop_filter_chroma12_neon: 31.7
hevc_v_loop_filter_chroma8_c: 369.0
hevc_v_loop_filter_chroma8_neon: 55.0
hevc_v_loop_filter_chroma10_c: 389.0
hevc_v_loop_filter_chroma10_neon: 54.0
hevc_v_loop_filter_chroma12_c: 389.5
hevc_v_loop_filter_chroma12_neon: 53.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
2 years ago
xufuji456
4b4de07721
libavcodec/hevc: add hevc idct4x4 neon of aarch64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
J. Dekker
9bed814e1d
lavc/aarch64: add hevc horizontal qpel/uni/bi
...
checkasm --benchmark on Ampere Altra (Neoverse N1):
put_hevc_qpel_bi_h4_8_c: 170.7
put_hevc_qpel_bi_h4_8_neon: 64.5
put_hevc_qpel_bi_h6_8_c: 373.7
put_hevc_qpel_bi_h6_8_neon: 130.2
put_hevc_qpel_bi_h8_8_c: 662.0
put_hevc_qpel_bi_h8_8_neon: 138.5
put_hevc_qpel_bi_h12_8_c: 1529.5
put_hevc_qpel_bi_h12_8_neon: 422.0
put_hevc_qpel_bi_h16_8_c: 2735.5
put_hevc_qpel_bi_h16_8_neon: 560.5
put_hevc_qpel_bi_h24_8_c: 6015.7
put_hevc_qpel_bi_h24_8_neon: 1636.0
put_hevc_qpel_bi_h32_8_c: 10779.0
put_hevc_qpel_bi_h32_8_neon: 2204.5
put_hevc_qpel_bi_h48_8_c: 24375.0
put_hevc_qpel_bi_h48_8_neon: 4984.0
put_hevc_qpel_bi_h64_8_c: 42768.0
put_hevc_qpel_bi_h64_8_neon: 8795.7
put_hevc_qpel_h4_8_c: 149.0
put_hevc_qpel_h4_8_neon: 55.7
put_hevc_qpel_h6_8_c: 321.2
put_hevc_qpel_h6_8_neon: 106.0
put_hevc_qpel_h8_8_c: 578.7
put_hevc_qpel_h8_8_neon: 133.2
put_hevc_qpel_h12_8_c: 1279.0
put_hevc_qpel_h12_8_neon: 391.7
put_hevc_qpel_h16_8_c: 2286.2
put_hevc_qpel_h16_8_neon: 519.7
put_hevc_qpel_h24_8_c: 5100.7
put_hevc_qpel_h24_8_neon: 1546.2
put_hevc_qpel_h32_8_c: 9022.0
put_hevc_qpel_h32_8_neon: 2060.2
put_hevc_qpel_h48_8_c: 20293.5
put_hevc_qpel_h48_8_neon: 4656.7
put_hevc_qpel_h64_8_c: 36037.0
put_hevc_qpel_h64_8_neon: 8262.7
put_hevc_qpel_uni_h4_8_c: 162.2
put_hevc_qpel_uni_h4_8_neon: 61.7
put_hevc_qpel_uni_h6_8_c: 355.2
put_hevc_qpel_uni_h6_8_neon: 114.2
put_hevc_qpel_uni_h8_8_c: 651.0
put_hevc_qpel_uni_h8_8_neon: 135.7
put_hevc_qpel_uni_h12_8_c: 1412.5
put_hevc_qpel_uni_h12_8_neon: 402.7
put_hevc_qpel_uni_h16_8_c: 2551.0
put_hevc_qpel_uni_h16_8_neon: 533.5
put_hevc_qpel_uni_h24_8_c: 5782.2
put_hevc_qpel_uni_h24_8_neon: 1578.7
put_hevc_qpel_uni_h32_8_c: 10586.5
put_hevc_qpel_uni_h32_8_neon: 2102.2
put_hevc_qpel_uni_h48_8_c: 23812.0
put_hevc_qpel_uni_h48_8_neon: 4739.5
put_hevc_qpel_uni_h64_8_c: 42958.7
put_hevc_qpel_uni_h64_8_neon: 8366.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
2 years ago
J. Dekker
ce2f47318b
lavc/aarch64: hevc_add_res add 12bit variants
...
hevc_add_res_4x4_12_c: 46.0
hevc_add_res_4x4_12_neon: 18.7
hevc_add_res_8x8_12_c: 194.7
hevc_add_res_8x8_12_neon: 25.2
hevc_add_res_16x16_12_c: 716.0
hevc_add_res_16x16_12_neon: 69.7
hevc_add_res_32x32_12_c: 3820.7
hevc_add_res_32x32_12_neon: 261.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
Andreas Rheinhardt
b3bbbb14d0
avcodec/hevcdsp: Constify src pointers
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
3 years ago
J. Dekker
2e832be322
lavc/aarch64: add hevc sao edge 8x8
...
bench on AWS Graviton:
hevc_sao_edge_8x8_8_c: 516.0
hevc_sao_edge_8x8_8_neon: 81.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
J. Dekker
92f67e4017
lavc/aarch64: add hevc sao edge 16x16
...
bench on AWS Graviton:
hevc_sao_edge_16x16_8_c: 1857.0
hevc_sao_edge_16x16_8_neon: 211.0
hevc_sao_edge_32x32_8_c: 7802.2
hevc_sao_edge_32x32_8_neon: 808.2
hevc_sao_edge_48x48_8_c: 16764.2
hevc_sao_edge_48x48_8_neon: 1796.5
hevc_sao_edge_64x64_8_c: 32647.5
hevc_sao_edge_64x64_8_neon: 3118.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
J. Dekker
d957ee34a6
lavc/aarch64: fix hevc sao band filter
...
The SAO band filter can be called with non-multiples of 8, we round up
to the nearest multiple of 8 to account for this.
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
Martin Storsjö
24b93022fe
aarch64: Disable ff_hevc_sao_band_filter_8x8_8_neon out of precaution
...
While this function on its own passes all of fate-hevc, there's
indications that the function might need to handle widths that
aren't a multiple of 8 (noted in commit
f63f9be37c
, which later was
reverted).
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Martin Storsjö
16fba44b4d
Revert "lavc/aarch64: add hevc sao edge 16x16"
...
This reverts commit a9214a2ca3
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Martin Storsjö
df48b1d06f
Revert "lavc/aarch64: add hevc sao edge 8x8"
...
This reverts commit c97ffc1a77
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Martin Storsjö
cafed377eb
Revert "lavc/aarch64: add hevc sao band 8x8 tiling"
...
This reverts commit f63f9be37c
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
J. Dekker
f63f9be37c
lavc/aarch64: add hevc sao band 8x8 tiling
...
bench on AWS Graviton:
hevc_sao_band_8x8_8_c: 317.5
hevc_sao_band_8x8_8_neon: 97.5
hevc_sao_band_16x16_8_c: 1115.0
hevc_sao_band_16x16_8_neon: 322.7
hevc_sao_band_32x32_8_c: 4599.2
hevc_sao_band_32x32_8_neon: 1246.2
hevc_sao_band_48x48_8_c: 10021.7
hevc_sao_band_48x48_8_neon: 2740.5
hevc_sao_band_64x64_8_c: 17635.0
hevc_sao_band_64x64_8_neon: 4875.7
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
J. Dekker
c97ffc1a77
lavc/aarch64: add hevc sao edge 8x8
...
bench on AWS Graviton:
hevc_sao_edge_8x8_8_c: 516.0
hevc_sao_edge_8x8_8_neon: 81.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
J. Dekker
a9214a2ca3
lavc/aarch64: add hevc sao edge 16x16
...
bench on AWS Graviton:
hevc_sao_edge_16x16_8_c: 1857.0
hevc_sao_edge_16x16_8_neon: 211.0
hevc_sao_edge_32x32_8_c: 7802.2
hevc_sao_edge_32x32_8_neon: 808.2
hevc_sao_edge_48x48_8_c: 16764.2
hevc_sao_edge_48x48_8_neon: 1796.5
hevc_sao_edge_64x64_8_c: 32647.5
hevc_sao_edge_64x64_8_neon: 3118.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
Josh Dekker
7ac41e0db2
lavc/aarch64: add HEVC sao_band NEON
...
Only works for 8x8.
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Josh Dekker
75c2ddfa61
lavc/aarch64: add HEVC idct_dc NEON
...
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Reimar Döffinger
00c916ef61
lavc/aarch64: port HEVC add_residual NEON
...
Speedup is fairly small, around 1.5%, but these are fairly simple.
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Reimar Döffinger
30f80d855b
lavc/aarch64: port HEVC SIMD idct NEON
...
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Martin Storsjö
e0604d508e
swscale: aarch64: Add a NEON implementation of interleaveBytes
...
This allows speeding up format conversions from yuv420 to nv12.
Cortex A53 A72 A73
interleave_bytes_c: 86077.5 51433.0 66972.0
interleave_bytes_neon: 19701.7 23019.2 15859.2
interleave_bytes_aligned_c: 86603.0 52017.2 67484.2
interleave_bytes_aligned_neon: 9061.0 7623.0 6309.0
Signed-off-by: Martin Storsjö <martin@martin.st>
5 years ago
Janne Grunau
3956a5e0ea
aarch64: NEON vorbis_inverse_coupling
...
From the ARMv7 NEON version. 16 times faster as the C version, overall
more than 12% faster vorbis decoding on Apple's A7.
11 years ago
Diego Biurrun
c9f933b5b6
Add av_cold attributes to arch-specific init functions
12 years ago
Ronald S. Bultje
1768e43ceb
vorbisdsp: change block_size type from int to intptr_t.
...
This saves one instruction in the x86-64 assembly.
12 years ago
Ronald S. Bultje
fef906c77c
Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.
...
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
12 years ago
Mans Rullgard
d526c5338d
ARM: allow runtime masking of CPU features
...
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Diego Biurrun
3dde147ff9
cosmetics: Consistently place static, inline and av_cold attributes/keywords.
13 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
a8ae4e0e7b
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80ba1ddb58
)
14 years ago
Justin Ruggles
80ba1ddb58
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Måns Rullgård
08255107cf
DCA: ARM/NEON optimised lfe_fir
...
Originally committed as revision 22863 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Måns Rullgård
2ed6f39944
Replace many includes of libavutil/common.h with what is actually needed
...
This reduces the number of false dependencies on header files and
speeds up compilation.
Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Måns Rullgård
75fb5c24ed
Move FASTDIV macro to intmath.h
...
Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Måns Rullgård
544f5a922f
Optimise av_log2 with clz when available
...
10% faster flac decoding on x86 and ARM.
Originally committed as revision 21217 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Stefano Sabatini
987903826b
Globally rename the header inclusion guard names.
...
Consistently apply this rule: the guard name is obtained from the
filename by stripping the leading "lib", converting '/' and '.' to
'_' and uppercasing the resulting name. Guard names in the root
directory have to be prefixed by "FFMPEG_".
Originally committed as revision 15120 to svn://svn.ffmpeg.org/ffmpeg/trunk
17 years ago
Måns Rullgård
3540b950ec
add missing #include "common.h" to libavutil headers
...
Originally committed as revision 12502 to svn://svn.ffmpeg.org/ffmpeg/trunk
17 years ago