Rémi Denis-Courmont
b3825bbe45
riscv: test for assembler support
...
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
1 year ago
sunyuechi
0b9d009b4a
lavc/vc1dsp: R-V V inv_trans
...
C908:
vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5
vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5
vc1dsp.vc1_inv_trans_8x4_dc_c: 228.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5
vc1dsp.vc1_inv_trans_8x8_dc_c: 476.5
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
1 year ago
Mikhail Nitenko
0f745b74ec
lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions
...
Benchmarks A53 A55 A72 A76
avg_h264_qpel_8_mc01_10_c: 936.5 924.0 656.0 504.7
avg_h264_qpel_8_mc01_10_neon: 234.7 202.0 120.7 63.2
avg_h264_qpel_8_mc02_10_c: 921.0 920.0 669.2 493.7
avg_h264_qpel_8_mc02_10_neon: 202.0 173.2 102.7 58.5
avg_h264_qpel_8_mc03_10_c: 936.5 924.0 656.0 509.5
avg_h264_qpel_8_mc03_10_neon: 236.2 203.7 120.0 63.2
avg_h264_qpel_8_mc10_10_c: 1441.0 1437.7 806.7 478.5
avg_h264_qpel_8_mc10_10_neon: 325.7 324.0 153.7 94.2
avg_h264_qpel_8_mc11_10_c: 2160.7 2148.2 1366.7 906.7
avg_h264_qpel_8_mc11_10_neon: 492.0 464.0 242.5 134.5
avg_h264_qpel_8_mc13_10_c: 2157.0 2138.2 1357.0 908.2
avg_h264_qpel_8_mc13_10_neon: 494.0 467.2 242.0 140.0
avg_h264_qpel_8_mc20_10_c: 1433.5 1410.0 785.2 486.0
avg_h264_qpel_8_mc20_10_neon: 293.7 289.7 138.0 91.5
avg_h264_qpel_8_mc30_10_c: 1458.5 1461.7 813.7 483.2
avg_h264_qpel_8_mc30_10_neon: 341.7 339.2 154.0 95.2
avg_h264_qpel_8_mc31_10_c: 2194.7 2197.2 1358.7 928.0
avg_h264_qpel_8_mc31_10_neon: 520.0 495.0 245.5 142.5
avg_h264_qpel_8_mc33_10_c: 2188.0 2205.5 1356.7 910.7
avg_h264_qpel_8_mc33_10_neon: 521.0 494.5 245.7 145.7
avg_h264_qpel_16_mc01_10_c: 3717.2 3595.0 2610.0 2012.0
avg_h264_qpel_16_mc01_10_neon: 920.5 791.5 483.2 240.5
avg_h264_qpel_16_mc02_10_c: 3684.0 3633.0 2659.0 1919.7
avg_h264_qpel_16_mc02_10_neon: 790.7 678.2 409.2 217.0
avg_h264_qpel_16_mc03_10_c: 3726.5 3596.0 2606.7 2010.0
avg_h264_qpel_16_mc03_10_neon: 922.0 792.5 483.2 239.7
avg_h264_qpel_16_mc10_10_c: 5912.0 5803.2 3241.5 1916.7
avg_h264_qpel_16_mc10_10_neon: 1267.5 1277.2 616.5 365.0
avg_h264_qpel_16_mc11_10_c: 8599.2 8482.5 5338.0 3616.2
avg_h264_qpel_16_mc11_10_neon: 1913.0 1827.0 956.2 542.2
avg_h264_qpel_16_mc13_10_c: 8643.7 8488.5 5388.0 3628.5
avg_h264_qpel_16_mc13_10_neon: 1914.7 1828.7 969.2 530.5
avg_h264_qpel_16_mc20_10_c: 5719.5 5641.0 3147.0 1946.2
avg_h264_qpel_16_mc20_10_neon: 1139.5 1150.0 539.5 344.0
avg_h264_qpel_16_mc30_10_c: 5930.0 5872.5 3267.5 1918.0
avg_h264_qpel_16_mc30_10_neon: 1331.5 1341.2 616.5 369.5
avg_h264_qpel_16_mc31_10_c: 8758.7 8697.7 5353.0 3630.7
avg_h264_qpel_16_mc31_10_neon: 2018.7 1941.7 982.2 574.7
avg_h264_qpel_16_mc33_10_c: 8683.2 8675.2 5339.2 3634.7
avg_h264_qpel_16_mc33_10_neon: 2019.7 1940.2 994.5 566.0
put_h264_qpel_8_mc01_10_c: 854.2 843.0 599.2 478.0
put_h264_qpel_8_mc01_10_neon: 192.7 168.0 101.7 56.7
put_h264_qpel_8_mc02_10_c: 766.5 760.0 550.2 441.0
put_h264_qpel_8_mc02_10_neon: 160.0 139.2 88.7 53.0
put_h264_qpel_8_mc03_10_c: 854.2 843.0 599.2 479.0
put_h264_qpel_8_mc03_10_neon: 194.2 169.7 102.0 56.2
put_h264_qpel_8_mc10_10_c: 1352.7 1353.7 749.7 446.7
put_h264_qpel_8_mc10_10_neon: 289.7 294.2 135.5 88.5
put_h264_qpel_8_mc11_10_c: 2080.0 2066.2 1309.5 876.7
put_h264_qpel_8_mc11_10_neon: 450.0 429.7 229.7 131.2
put_h264_qpel_8_mc13_10_c: 2074.7 2060.2 1294.5 870.5
put_h264_qpel_8_mc13_10_neon: 452.5 434.5 226.5 130.0
put_h264_qpel_8_mc20_10_c: 1221.5 1216.0 684.5 399.7
put_h264_qpel_8_mc20_10_neon: 257.7 262.5 121.2 78.7
put_h264_qpel_8_mc30_10_c: 1379.0 1374.7 757.2 449.5
put_h264_qpel_8_mc30_10_neon: 305.7 310.2 135.5 86.5
put_h264_qpel_8_mc31_10_c: 2109.2 2119.7 1299.5 878.0
put_h264_qpel_8_mc31_10_neon: 478.0 458.5 226.0 137.2
put_h264_qpel_8_mc33_10_c: 2101.5 2115.2 1306.5 887.0
put_h264_qpel_8_mc33_10_neon: 479.0 458.7 229.7 141.7
put_h264_qpel_16_mc01_10_c: 3485.7 3396.7 2460.5 1914.5
put_h264_qpel_16_mc01_10_neon: 752.5 665.5 397.0 213.2
put_h264_qpel_16_mc02_10_c: 3103.5 3023.2 2154.7 1720.7
put_h264_qpel_16_mc02_10_neon: 622.7 551.2 347.7 196.2
put_h264_qpel_16_mc03_10_c: 3486.2 3394.0 2436.5 1917.7
put_h264_qpel_16_mc03_10_neon: 754.0 666.5 397.0 215.7
put_h264_qpel_16_mc10_10_c: 5533.0 5488.5 2989.0 1783.0
put_h264_qpel_16_mc10_10_neon: 1123.5 1165.2 535.2 334.7
put_h264_qpel_16_mc11_10_c: 8437.7 8281.2 5209.0 3510.7
put_h264_qpel_16_mc11_10_neon: 1745.0 1697.0 878.5 513.5
put_h264_qpel_16_mc13_10_c: 8567.7 8468.0 5221.5 3528.0
put_h264_qpel_16_mc13_10_neon: 1751.7 1698.2 889.2 507.0
put_h264_qpel_16_mc20_10_c: 4907.5 4885.0 2786.2 1607.5
put_h264_qpel_16_mc20_10_neon: 995.5 1034.5 475.5 307.0
put_h264_qpel_16_mc30_10_c: 5579.7 5537.7 3045.2 1789.5
put_h264_qpel_16_mc30_10_neon: 1187.5 1231.2 532.5 334.5
put_h264_qpel_16_mc31_10_c: 8677.2 8672.5 5204.2 3516.0
put_h264_qpel_16_mc31_10_neon: 1850.7 1813.2 893.0 545.2
put_h264_qpel_16_mc33_10_c: 8688.7 8671.2 5223.2 3512.0
put_h264_qpel_16_mc33_10_neon: 1851.7 1814.2 908.5 535.2
Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
sunyuechi
8bdb663062
lavc/ac3dsp: R-V V float_to_fixed24
...
c910
float_to_fixed24_c: 2207.2
float_to_fixed24_rvv_f32: 696.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
1 year ago
Paul B Mahol
7e453dad3c
avcodec/qoadec: fix overreads and fix packet size check
1 year ago
Michael Niedermayer
22daf2148f
avcodec/av1dec: Fix resolving zero divisor
...
Fixes: Out of array read
Fixes: global-buffer-overflow-AV1
Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de>
Tested-by: "Wang, Fei W" <fei.w.wang@intel.com>
Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago
Leo Izen
c4be080e65
avcodec/jpegxl_parser: fix parsing sequences of extremely small files
...
This patch allows the JXL parser to parse sequences of extremely small
files concatenated together. (e.g. smaller than the parser buffer)
Signed-off-by: Leo Izen <leo.izen@gmail.com>
1 year ago
Leo Izen
019b3ea65a
avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location
...
According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size
field occurs *after* the 32-bit box type, not before. This fix should
allow correct parsing of JXL files with extended-size boxes.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
1 year ago
Haihao Xiang
fc73b372cd
lavc/qsvdec: reduce info message when more data is required
...
demote the info to AV_LOG_VERBOSE
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
1 year ago
Haihao Xiang
e233f3e75f
lavc/qsvdec: return 0 if more data is required
...
The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not
return AVERROR(EAGAIN). commit 42b20c9
added an assertion to check the
returned value.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
1 year ago
Lynne
8c117b75af
lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled
...
Forgotten.
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Tested-by: Neal Gompa <ngompa13@gmail.com>
1 year ago
Paul B Mahol
0a13178de8
avcodec/qoadec: add support for midstream sample rate/layout changes
1 year ago
Anton Khirnov
5230257ea1
lavc/dvdsubenc: only check canvas size when it is actually set
...
Fixes #10650
1 year ago
Logan Lyu
fa0470347e
lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv
...
put_hevc_qpel_bi_hv4_8_c: 433.7
put_hevc_qpel_bi_hv4_8_i8mm: 117.9
put_hevc_qpel_bi_hv6_8_c: 803.9
put_hevc_qpel_bi_hv6_8_i8mm: 252.7
put_hevc_qpel_bi_hv8_8_c: 1296.4
put_hevc_qpel_bi_hv8_8_i8mm: 316.2
put_hevc_qpel_bi_hv12_8_c: 2867.4
put_hevc_qpel_bi_hv12_8_i8mm: 669.2
put_hevc_qpel_bi_hv16_8_c: 4709.4
put_hevc_qpel_bi_hv16_8_i8mm: 929.9
put_hevc_qpel_bi_hv24_8_c: 9639.7
put_hevc_qpel_bi_hv24_8_i8mm: 2072.4
put_hevc_qpel_bi_hv32_8_c: 16663.7
put_hevc_qpel_bi_hv32_8_i8mm: 3391.4
put_hevc_qpel_bi_hv48_8_c: 36972.9
put_hevc_qpel_bi_hv48_8_i8mm: 7505.7
put_hevc_qpel_bi_hv64_8_c: 64106.4
put_hevc_qpel_bi_hv64_8_i8mm: 13145.2
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
595f97028b
lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v
...
put_hevc_qpel_bi_v4_8_c: 166.1
put_hevc_qpel_bi_v4_8_neon: 61.9
put_hevc_qpel_bi_v6_8_c: 309.4
put_hevc_qpel_bi_v6_8_neon: 75.6
put_hevc_qpel_bi_v8_8_c: 531.1
put_hevc_qpel_bi_v8_8_neon: 78.1
put_hevc_qpel_bi_v12_8_c: 1139.9
put_hevc_qpel_bi_v12_8_neon: 238.1
put_hevc_qpel_bi_v16_8_c: 2063.6
put_hevc_qpel_bi_v16_8_neon: 308.9
put_hevc_qpel_bi_v24_8_c: 4317.1
put_hevc_qpel_bi_v24_8_neon: 629.9
put_hevc_qpel_bi_v32_8_c: 8241.9
put_hevc_qpel_bi_v32_8_neon: 1140.1
put_hevc_qpel_bi_v48_8_c: 18422.9
put_hevc_qpel_bi_v48_8_neon: 2533.9
put_hevc_qpel_bi_v64_8_c: 37508.6
put_hevc_qpel_bi_v64_8_neon: 4520.1
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
00290a64f7
lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv
...
put_hevc_epel_bi_hv4_8_c: 242.9
put_hevc_epel_bi_hv4_8_i8mm: 68.6
put_hevc_epel_bi_hv6_8_c: 402.4
put_hevc_epel_bi_hv6_8_i8mm: 135.9
put_hevc_epel_bi_hv8_8_c: 636.4
put_hevc_epel_bi_hv8_8_i8mm: 145.6
put_hevc_epel_bi_hv12_8_c: 1363.1
put_hevc_epel_bi_hv12_8_i8mm: 324.1
put_hevc_epel_bi_hv16_8_c: 2222.1
put_hevc_epel_bi_hv16_8_i8mm: 509.1
put_hevc_epel_bi_hv24_8_c: 4793.4
put_hevc_epel_bi_hv24_8_i8mm: 1091.9
put_hevc_epel_bi_hv32_8_c: 8393.9
put_hevc_epel_bi_hv32_8_i8mm: 1720.6
put_hevc_epel_bi_hv48_8_c: 19526.6
put_hevc_epel_bi_hv48_8_i8mm: 4285.9
put_hevc_epel_bi_hv64_8_c: 33915.4
put_hevc_epel_bi_hv64_8_i8mm: 6783.6
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
0448f27f41
lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v
...
put_hevc_epel_bi_v4_8_c: 138.4
put_hevc_epel_bi_v4_8_neon: 33.7
put_hevc_epel_bi_v6_8_c: 302.9
put_hevc_epel_bi_v6_8_neon: 46.7
put_hevc_epel_bi_v8_8_c: 408.7
put_hevc_epel_bi_v8_8_neon: 48.7
put_hevc_epel_bi_v12_8_c: 779.4
put_hevc_epel_bi_v12_8_neon: 139.7
put_hevc_epel_bi_v16_8_c: 1344.9
put_hevc_epel_bi_v16_8_neon: 160.2
put_hevc_epel_bi_v24_8_c: 2981.7
put_hevc_epel_bi_v24_8_neon: 344.9
put_hevc_epel_bi_v32_8_c: 5280.9
put_hevc_epel_bi_v32_8_neon: 618.4
put_hevc_epel_bi_v48_8_c: 12494.9
put_hevc_epel_bi_v48_8_neon: 1364.4
put_hevc_epel_bi_v64_8_c: 22127.7
put_hevc_epel_bi_v64_8_neon: 2473.7
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
216275bd80
lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h
...
put_hevc_epel_bi_h4_8_c: 96.0
put_hevc_epel_bi_h4_8_neon: 36.3
put_hevc_epel_bi_h6_8_c: 288.3
put_hevc_epel_bi_h6_8_neon: 59.3
put_hevc_epel_bi_h8_8_c: 358.5
put_hevc_epel_bi_h8_8_neon: 61.5
put_hevc_epel_bi_h12_8_c: 759.8
put_hevc_epel_bi_h12_8_neon: 159.5
put_hevc_epel_bi_h16_8_c: 1307.0
put_hevc_epel_bi_h16_8_neon: 182.0
put_hevc_epel_bi_h24_8_c: 2778.3
put_hevc_epel_bi_h24_8_neon: 430.5
put_hevc_epel_bi_h32_8_c: 4952.3
put_hevc_epel_bi_h32_8_neon: 679.5
put_hevc_epel_bi_h48_8_c: 11803.3
put_hevc_epel_bi_h48_8_neon: 1443.5
put_hevc_epel_bi_h64_8_c: 20654.8
put_hevc_epel_bi_h64_8_neon: 2737.0
put_hevc_qpel_bi_h4_8_c: 140.0
put_hevc_qpel_bi_h4_8_neon: 111.5
put_hevc_qpel_bi_h6_8_c: 318.0
put_hevc_qpel_bi_h6_8_neon: 85.8
put_hevc_qpel_bi_h8_8_c: 536.5
put_hevc_qpel_bi_h8_8_neon: 95.3
put_hevc_qpel_bi_h12_8_c: 1188.5
put_hevc_qpel_bi_h12_8_neon: 291.3
put_hevc_qpel_bi_h16_8_c: 2064.3
put_hevc_qpel_bi_h16_8_neon: 365.3
put_hevc_qpel_bi_h24_8_c: 4757.5
put_hevc_qpel_bi_h24_8_neon: 1010.0
put_hevc_qpel_bi_h32_8_c: 8351.8
put_hevc_qpel_bi_h32_8_neon: 2917.8
put_hevc_qpel_bi_h48_8_c: 19299.8
put_hevc_qpel_bi_h48_8_neon: 2976.8
put_hevc_qpel_bi_h64_8_c: 34182.5
put_hevc_qpel_bi_h64_8_neon: 5236.3
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Logan Lyu
40cf4a5ca3
lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels
...
put_hevc_pel_bi_pixels4_8_c: 54.7
put_hevc_pel_bi_pixels4_8_neon: 43.0
put_hevc_pel_bi_pixels6_8_c: 94.7
put_hevc_pel_bi_pixels6_8_neon: 37.0
put_hevc_pel_bi_pixels8_8_c: 171.0
put_hevc_pel_bi_pixels8_8_neon: 24.0
put_hevc_pel_bi_pixels12_8_c: 354.0
put_hevc_pel_bi_pixels12_8_neon: 68.7
put_hevc_pel_bi_pixels16_8_c: 588.2
put_hevc_pel_bi_pixels16_8_neon: 77.5
put_hevc_pel_bi_pixels24_8_c: 1670.7
put_hevc_pel_bi_pixels24_8_neon: 173.0
put_hevc_pel_bi_pixels32_8_c: 2267.7
put_hevc_pel_bi_pixels32_8_neon: 281.2
put_hevc_pel_bi_pixels48_8_c: 5787.5
put_hevc_pel_bi_pixels48_8_neon: 673.5
put_hevc_pel_bi_pixels64_8_c: 9897.0
put_hevc_pel_bi_pixels64_8_neon: 1159.5
Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
James Almer
6d19611251
avcodec/ac3dsp: add missing stddef.h include
...
Should fix make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
xufuji456
cc86343b96
lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
...
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"
Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Zhao Zhili
d526a34c20
avcodec/videotoolboxenc: refactor dump encoder name
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Zhao Zhili
cb049d377f
avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderID
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Paul B Mahol
3609d2b783
avcodec: add QOA decoder
1 year ago
Geoffrey McRae
93b5d9030b
libavcodec/mlpdec: add missing correction to ch_layout when downmixing
...
This fixes corrupted audio for applications relying on ch_layout when
codec downmixing is active.
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Geoffrey McRae
a8677bcc8f
libavcodec/dcadec: adjust the `ch_layout` when downmix is active
...
Applications making use of this codec with the `downmix` option are
segfaulting unless the `ch_layout` is overridden after `avcodec_open2`
as can be seen in projects like MythTV[1]
This patch fixes this by overriding the ch_layout as done in other
decoders such as AC3.
1: af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
72390dea00
mips/ac3dsp_mips: add missing stddef.h header include
...
Fixes compilation failures after 567c67c6c8
.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
e40ea9f34b
x86/ac3dsp: add ff_float_to_fixed24_avx()
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
d8b1a34433
x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Rémi Denis-Courmont
0fa421c8f1
lavc/llvidencdsp: add R-V V diff_bytes
...
diff_bytes_c: 163.0
diff_bytes_rvv_i32: 52.7
1 year ago
Rémi Denis-Courmont
0183c2c830
lavc/aacpsdsp: use LMUL=2 and amortise strides
...
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
right shifts, we can only get one half-size vector per segment, or just 2
elements per vector (EMUL=1/2) - at least with 128-bit vectors.
This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
but improve it using three 4-segmented loads instead of 12 single-segment
loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
and splat the scalar coefficient into vectors. Then we can use a
unit-stride and maximum EMUL. But the downside then is that we have to
multiply the 3 (of 16) unused segments with zero as part of the
multiply-accumulate operations.
In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.
Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.
Before:
ps_hybrid_analysis_c: 1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32: 990.0 (after)
1 year ago
Rémi Denis-Courmont
b88d4058f9
lavc/g722dsp: optimise R-V V apply_qmf
...
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).
Before:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 78.2
After:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 65.2
1 year ago
James Almer
567c67c6c8
avcodec/ac3dsp: make len a size_t in float_to_fixed24
...
Should simplify asm implementations, and prevent UB on at least win64.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
2d9fd814d0
x86/: clear the high bits for order in scalarproduct_and_madd functions
...
Should fix checkasm failures on win64.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Zhao Zhili
e8a49b1424
avcodec/mmaldec: Fix build error
...
Fix #10670 .
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Zhao Zhili
f27fce0c0c
avcodec/mediacodecdec: fix return EAGAIN after EOF
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Dmitry Rogozhkin
e9c93009fc
avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used
...
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.
v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)
See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109
("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
1 year ago
Zhao Zhili
aa3b857101
avcodec/h264_mp4toannexb_bsf: process new extradata
...
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Zhao Zhili
d3aa0cd16f
avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames
...
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.
This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Zhao Zhili
4c4b833abd
avcodec/h264_mp4toannexb_bsf: remove pass padding size as argument
...
It's a fixed value. There is no use case to change that.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Zhao Zhili
91cbae2f6c
avcodec/h264_mp4toannexb_bsf: refactor start_code_size handling
...
start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
Michael Niedermayer
fb52070848
avcodec/h264dec: use BOOL for skip_gray, noref_gray
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago
Jun Zhao
c961ac4b0c
vulkan_decode: fix the print format of VkDeviceSize
...
VkDeviceSize represents device memory size and offset
values as uint64_t in Spec.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
1 year ago
James Almer
1258f99978
avcodec: bump version after EVC additions
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Dawid Kozinski
cfe2947887
avcodec/evc_decoder: Provided support for EVC decoder
...
- Added EVC decoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xevd wrapper
Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Dawid Kozinski
c59a96fd08
avcodec/evc_encoder: Provided support for EVC encoder
...
- Added EVC encoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xeve wrapper
Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Michael Niedermayer
e56d91f8a8
avcodec/h264dec: Support skipping frames that used gray gap frames
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago
Michael Niedermayer
6364fa9e9a
avcodec/h264: Avoid using gray gap frames as references
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago
Michael Niedermayer
29f6c9b04d
avcodec/h264: keep track of which frames used gray references
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago
Michael Niedermayer
e4337606e1
avcodec/h264dec: More elaborate documentation for frame_recovered
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1 year ago