Andreas Rheinhardt
8a349bb02f
avcodec/x86/fpel: Remove declarations of inexistent functions
...
Forgotten in 50a8cbb23e
and
a51279bbde
.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
7ed1e806e7
avcodec/x86/simple_idct: Empty MMX state in ff_simple_idct_mmx
...
We currently mostly do not empty the MMX state in our MMX
DSP functions; instead we only do so before code that might
be using x87 code. This is a violation of the System V i386 ABI
(and maybe of other ABIs, too):
"The CPU shall be in x87 mode upon entry to a function. Therefore,
every function that uses the MMX registers is required to issue an
emms or femms instruction after using MMX registers, before returning
or calling another function." (See 2.2.1 in [1])
This patch does not intend to change all these functions to abide
by the ABI; it only does so for ff_simple_idct_mmx, as this
function can by called by external users, because it is exported
via the avdct API. Without this, the following fragment will
assert (on x86_32, as ff_simple_idct_mmx is not used on x64):
int16_t __attribute__ ((aligned (16))) block[64];
AVDCT *dct = avcodec_dct_alloc();
dct->idct_algo = FF_IDCT_AUTO;
avcodec_dct_init(dct);
dct->idct(block);
av_assert0_fpu();
[1]: https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/intel386-psABI-1.1.pdf
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
6106fb2b4c
avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
...
Besides simplifying address computations (it saves 432B of .text
in hevcdsp.o alone here) it also fixes undefined behaviour that
occurs if mx or my are 0 (happens when the filters are unused)
because they lead to an array index of -1 in the old code.
This happens in the checkasm-hevc_pel FATE-test.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Kieran Kunhya
40c5c19eac
x86/h264_pred: Convert ff_pred8x8_vertical_8_mmx to ff_pred8x8_vertical_8_sse2
1 year ago
Kieran Kunhya
f43b5f1098
vp6dsp: Remove MMX code
...
Missed from 6cb3ee8
1 year ago
James Almer
2c324fcce0
x86/h264_intrapred: convert ff_pred16x16_horizontal_8_mmxext to sse2
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Wu Jianhua
3372876888
avcodec/x86/vvc/vvcdsp_init: fix unresolved external symbol on ARCH_X86_32
...
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
1 year ago
James Almer
b181868aba
x86/h26x/h2656dsp: add missing preprocessor wrappers
...
Fixes compilation on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
6b6eb7d74e
x86/Makefile: fix hevc and vvc dependency of h2656dsp.o
...
And remove tabs while at it.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
2dc8221e66
x86/hevcdsp_init.c: fix preprocessor check
...
HAVE_AVX2_EXTERNAL has a value, so check for it.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
78a7927df7
x86/vvc/vvc_mc: wrap the entire file in x86_64 and AVX2 checks
...
Fixes compilation with old yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
bf62ddc7bf
x86/vvc/vvc_mc: set the correct number of used registers in vvc_w_avg functions
...
Fixes crashes when running fate-vvc-conformance-WP_A_3 on Win64 targets
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Wu Jianhua
326cc01de9
avcodec/x86/vvc: add avg and avg_w AVX2 optimizations
...
The avg/avg_w is based on dav1d.
See https://code.videolan.org/videolan/dav1d/-/blob/master/src/x86/mc_avx2.asm
vvc_avg_8_2x2_c: 71.6
vvc_avg_8_2x2_avx2: 26.8
vvc_avg_8_2x4_c: 140.8
vvc_avg_8_2x4_avx2: 34.6
vvc_avg_8_2x8_c: 410.3
vvc_avg_8_2x8_avx2: 41.3
vvc_avg_8_2x16_c: 769.3
vvc_avg_8_2x16_avx2: 60.3
vvc_avg_8_2x32_c: 1669.6
vvc_avg_8_2x32_avx2: 105.1
vvc_avg_8_2x64_c: 1978.3
vvc_avg_8_2x64_avx2: 425.8
vvc_avg_8_2x128_c: 6536.8
vvc_avg_8_2x128_avx2: 1315.1
vvc_avg_8_4x2_c: 155.6
vvc_avg_8_4x2_avx2: 26.1
vvc_avg_8_4x4_c: 250.3
vvc_avg_8_4x4_avx2: 31.3
vvc_avg_8_4x8_c: 831.8
vvc_avg_8_4x8_avx2: 41.3
vvc_avg_8_4x16_c: 1461.1
vvc_avg_8_4x16_avx2: 57.1
vvc_avg_8_4x32_c: 2821.6
vvc_avg_8_4x32_avx2: 105.1
vvc_avg_8_4x64_c: 3615.8
vvc_avg_8_4x64_avx2: 412.6
vvc_avg_8_4x128_c: 11962.6
vvc_avg_8_4x128_avx2: 1274.3
vvc_avg_8_8x2_c: 215.8
vvc_avg_8_8x2_avx2: 29.1
vvc_avg_8_8x4_c: 430.6
vvc_avg_8_8x4_avx2: 37.6
vvc_avg_8_8x8_c: 1463.3
vvc_avg_8_8x8_avx2: 51.8
vvc_avg_8_8x16_c: 2630.1
vvc_avg_8_8x16_avx2: 97.6
vvc_avg_8_8x32_c: 5813.8
vvc_avg_8_8x32_avx2: 196.6
vvc_avg_8_8x64_c: 6687.3
vvc_avg_8_8x64_avx2: 487.8
vvc_avg_8_8x128_c: 13178.6
vvc_avg_8_8x128_avx2: 1290.6
vvc_avg_8_16x2_c: 443.8
vvc_avg_8_16x2_avx2: 28.3
vvc_avg_8_16x4_c: 1253.3
vvc_avg_8_16x4_avx2: 32.1
vvc_avg_8_16x8_c: 2236.3
vvc_avg_8_16x8_avx2: 44.3
vvc_avg_8_16x16_c: 5127.8
vvc_avg_8_16x16_avx2: 63.3
vvc_avg_8_16x32_c: 6573.3
vvc_avg_8_16x32_avx2: 223.6
vvc_avg_8_16x64_c: 30311.8
vvc_avg_8_16x64_avx2: 437.8
vvc_avg_8_16x128_c: 25693.3
vvc_avg_8_16x128_avx2: 1266.8
vvc_avg_8_32x2_c: 954.6
vvc_avg_8_32x2_avx2: 32.1
vvc_avg_8_32x4_c: 2359.6
vvc_avg_8_32x4_avx2: 39.6
vvc_avg_8_32x8_c: 5703.6
vvc_avg_8_32x8_avx2: 57.1
vvc_avg_8_32x16_c: 9967.6
vvc_avg_8_32x16_avx2: 107.1
vvc_avg_8_32x32_c: 21327.6
vvc_avg_8_32x32_avx2: 272.6
vvc_avg_8_32x64_c: 39240.8
vvc_avg_8_32x64_avx2: 529.6
vvc_avg_8_32x128_c: 52580.8
vvc_avg_8_32x128_avx2: 1338.8
vvc_avg_8_64x2_c: 1647.3
vvc_avg_8_64x2_avx2: 38.8
vvc_avg_8_64x4_c: 5130.1
vvc_avg_8_64x4_avx2: 58.8
vvc_avg_8_64x8_c: 6529.3
vvc_avg_8_64x8_avx2: 88.3
vvc_avg_8_64x16_c: 19913.6
vvc_avg_8_64x16_avx2: 162.3
vvc_avg_8_64x32_c: 39360.8
vvc_avg_8_64x32_avx2: 295.8
vvc_avg_8_64x64_c: 49658.3
vvc_avg_8_64x64_avx2: 784.1
vvc_avg_8_64x128_c: 108513.1
vvc_avg_8_64x128_avx2: 1977.1
vvc_avg_8_128x2_c: 3226.1
vvc_avg_8_128x2_avx2: 61.1
vvc_avg_8_128x4_c: 10280.3
vvc_avg_8_128x4_avx2: 94.6
vvc_avg_8_128x8_c: 18079.3
vvc_avg_8_128x8_avx2: 155.3
vvc_avg_8_128x16_c: 45121.8
vvc_avg_8_128x16_avx2: 285.3
vvc_avg_8_128x32_c: 48651.8
vvc_avg_8_128x32_avx2: 581.6
vvc_avg_8_128x64_c: 165078.6
vvc_avg_8_128x64_avx2: 1942.8
vvc_avg_8_128x128_c: 339103.1
vvc_avg_8_128x128_avx2: 4332.6
vvc_avg_10_2x2_c: 144.3
vvc_avg_10_2x2_avx2: 26.8
vvc_avg_10_2x4_c: 142.6
vvc_avg_10_2x4_avx2: 45.3
vvc_avg_10_2x8_c: 478.1
vvc_avg_10_2x8_avx2: 38.1
vvc_avg_10_2x16_c: 518.3
vvc_avg_10_2x16_avx2: 58.1
vvc_avg_10_2x32_c: 2059.8
vvc_avg_10_2x32_avx2: 93.1
vvc_avg_10_2x64_c: 2383.8
vvc_avg_10_2x64_avx2: 714.8
vvc_avg_10_2x128_c: 4498.3
vvc_avg_10_2x128_avx2: 1466.3
vvc_avg_10_4x2_c: 228.6
vvc_avg_10_4x2_avx2: 26.8
vvc_avg_10_4x4_c: 378.3
vvc_avg_10_4x4_avx2: 30.6
vvc_avg_10_4x8_c: 866.8
vvc_avg_10_4x8_avx2: 44.6
vvc_avg_10_4x16_c: 1018.1
vvc_avg_10_4x16_avx2: 58.1
vvc_avg_10_4x32_c: 3590.8
vvc_avg_10_4x32_avx2: 128.8
vvc_avg_10_4x64_c: 4200.8
vvc_avg_10_4x64_avx2: 663.6
vvc_avg_10_4x128_c: 8450.8
vvc_avg_10_4x128_avx2: 1531.8
vvc_avg_10_8x2_c: 369.3
vvc_avg_10_8x2_avx2: 28.3
vvc_avg_10_8x4_c: 513.8
vvc_avg_10_8x4_avx2: 32.1
vvc_avg_10_8x8_c: 1720.3
vvc_avg_10_8x8_avx2: 49.1
vvc_avg_10_8x16_c: 1894.8
vvc_avg_10_8x16_avx2: 71.6
vvc_avg_10_8x32_c: 3931.3
vvc_avg_10_8x32_avx2: 148.1
vvc_avg_10_8x64_c: 7964.3
vvc_avg_10_8x64_avx2: 613.1
vvc_avg_10_8x128_c: 15540.1
vvc_avg_10_8x128_avx2: 1585.1
vvc_avg_10_16x2_c: 877.3
vvc_avg_10_16x2_avx2: 27.6
vvc_avg_10_16x4_c: 955.8
vvc_avg_10_16x4_avx2: 29.8
vvc_avg_10_16x8_c: 3419.6
vvc_avg_10_16x8_avx2: 62.6
vvc_avg_10_16x16_c: 3826.8
vvc_avg_10_16x16_avx2: 54.3
vvc_avg_10_16x32_c: 7655.3
vvc_avg_10_16x32_avx2: 86.3
vvc_avg_10_16x64_c: 30011.1
vvc_avg_10_16x64_avx2: 692.6
vvc_avg_10_16x128_c: 47894.8
vvc_avg_10_16x128_avx2: 1580.3
vvc_avg_10_32x2_c: 944.3
vvc_avg_10_32x2_avx2: 29.8
vvc_avg_10_32x4_c: 2022.6
vvc_avg_10_32x4_avx2: 35.1
vvc_avg_10_32x8_c: 6148.8
vvc_avg_10_32x8_avx2: 51.3
vvc_avg_10_32x16_c: 12601.6
vvc_avg_10_32x16_avx2: 70.8
vvc_avg_10_32x32_c: 15958.6
vvc_avg_10_32x32_avx2: 124.3
vvc_avg_10_32x64_c: 31784.6
vvc_avg_10_32x64_avx2: 757.3
vvc_avg_10_32x128_c: 63892.8
vvc_avg_10_32x128_avx2: 1711.3
vvc_avg_10_64x2_c: 1890.8
vvc_avg_10_64x2_avx2: 34.3
vvc_avg_10_64x4_c: 6267.3
vvc_avg_10_64x4_avx2: 42.6
vvc_avg_10_64x8_c: 12778.1
vvc_avg_10_64x8_avx2: 67.8
vvc_avg_10_64x16_c: 22304.3
vvc_avg_10_64x16_avx2: 116.8
vvc_avg_10_64x32_c: 30777.1
vvc_avg_10_64x32_avx2: 201.1
vvc_avg_10_64x64_c: 60169.1
vvc_avg_10_64x64_avx2: 1454.3
vvc_avg_10_64x128_c: 124392.8
vvc_avg_10_64x128_avx2: 3648.6
vvc_avg_10_128x2_c: 3650.1
vvc_avg_10_128x2_avx2: 41.1
vvc_avg_10_128x4_c: 22887.8
vvc_avg_10_128x4_avx2: 64.1
vvc_avg_10_128x8_c: 14622.6
vvc_avg_10_128x8_avx2: 111.6
vvc_avg_10_128x16_c: 62207.6
vvc_avg_10_128x16_avx2: 186.3
vvc_avg_10_128x32_c: 59761.3
vvc_avg_10_128x32_avx2: 374.6
vvc_avg_10_128x64_c: 117504.3
vvc_avg_10_128x64_avx2: 2684.6
vvc_avg_10_128x128_c: 236767.6
vvc_avg_10_128x128_avx2: 15278.1
vvc_avg_12_2x2_c: 78.6
vvc_avg_12_2x2_avx2: 26.1
vvc_avg_12_2x4_c: 254.1
vvc_avg_12_2x4_avx2: 30.6
vvc_avg_12_2x8_c: 261.8
vvc_avg_12_2x8_avx2: 39.1
vvc_avg_12_2x16_c: 527.6
vvc_avg_12_2x16_avx2: 57.3
vvc_avg_12_2x32_c: 1089.1
vvc_avg_12_2x32_avx2: 93.8
vvc_avg_12_2x64_c: 2337.6
vvc_avg_12_2x64_avx2: 707.1
vvc_avg_12_2x128_c: 4582.1
vvc_avg_12_2x128_avx2: 1414.6
vvc_avg_12_4x2_c: 129.6
vvc_avg_12_4x2_avx2: 26.8
vvc_avg_12_4x4_c: 427.3
vvc_avg_12_4x4_avx2: 30.6
vvc_avg_12_4x8_c: 529.6
vvc_avg_12_4x8_avx2: 36.6
vvc_avg_12_4x16_c: 1022.1
vvc_avg_12_4x16_avx2: 57.3
vvc_avg_12_4x32_c: 1987.6
vvc_avg_12_4x32_avx2: 84.3
vvc_avg_12_4x64_c: 4147.6
vvc_avg_12_4x64_avx2: 706.3
vvc_avg_12_4x128_c: 8469.3
vvc_avg_12_4x128_avx2: 1448.3
vvc_avg_12_8x2_c: 253.6
vvc_avg_12_8x2_avx2: 27.6
vvc_avg_12_8x4_c: 836.3
vvc_avg_12_8x4_avx2: 32.1
vvc_avg_12_8x8_c: 1074.6
vvc_avg_12_8x8_avx2: 45.1
vvc_avg_12_8x16_c: 3616.8
vvc_avg_12_8x16_avx2: 71.6
vvc_avg_12_8x32_c: 3823.6
vvc_avg_12_8x32_avx2: 140.1
vvc_avg_12_8x64_c: 7764.8
vvc_avg_12_8x64_avx2: 656.1
vvc_avg_12_8x128_c: 15896.1
vvc_avg_12_8x128_avx2: 1232.8
vvc_avg_12_16x2_c: 462.1
vvc_avg_12_16x2_avx2: 26.8
vvc_avg_12_16x4_c: 1732.1
vvc_avg_12_16x4_avx2: 29.1
vvc_avg_12_16x8_c: 2097.6
vvc_avg_12_16x8_avx2: 62.6
vvc_avg_12_16x16_c: 6753.1
vvc_avg_12_16x16_avx2: 47.8
vvc_avg_12_16x32_c: 7373.1
vvc_avg_12_16x32_avx2: 80.8
vvc_avg_12_16x64_c: 15046.3
vvc_avg_12_16x64_avx2: 621.1
vvc_avg_12_16x128_c: 52574.6
vvc_avg_12_16x128_avx2: 1417.1
vvc_avg_12_32x2_c: 1712.1
vvc_avg_12_32x2_avx2: 29.8
vvc_avg_12_32x4_c: 2036.8
vvc_avg_12_32x4_avx2: 37.6
vvc_avg_12_32x8_c: 4017.6
vvc_avg_12_32x8_avx2: 44.1
vvc_avg_12_32x16_c: 8018.6
vvc_avg_12_32x16_avx2: 70.8
vvc_avg_12_32x32_c: 15637.6
vvc_avg_12_32x32_avx2: 124.3
vvc_avg_12_32x64_c: 31143.3
vvc_avg_12_32x64_avx2: 830.3
vvc_avg_12_32x128_c: 75706.8
vvc_avg_12_32x128_avx2: 1604.8
vvc_avg_12_64x2_c: 3230.3
vvc_avg_12_64x2_avx2: 33.6
vvc_avg_12_64x4_c: 4139.6
vvc_avg_12_64x4_avx2: 45.1
vvc_avg_12_64x8_c: 8201.6
vvc_avg_12_64x8_avx2: 67.1
vvc_avg_12_64x16_c: 25632.3
vvc_avg_12_64x16_avx2: 110.3
vvc_avg_12_64x32_c: 30744.3
vvc_avg_12_64x32_avx2: 200.3
vvc_avg_12_64x64_c: 105554.8
vvc_avg_12_64x64_avx2: 1325.6
vvc_avg_12_64x128_c: 235254.3
vvc_avg_12_64x128_avx2: 3132.6
vvc_avg_12_128x2_c: 6194.3
vvc_avg_12_128x2_avx2: 55.1
vvc_avg_12_128x4_c: 7583.8
vvc_avg_12_128x4_avx2: 79.3
vvc_avg_12_128x8_c: 14635.6
vvc_avg_12_128x8_avx2: 104.3
vvc_avg_12_128x16_c: 29270.8
vvc_avg_12_128x16_avx2: 194.3
vvc_avg_12_128x32_c: 60113.6
vvc_avg_12_128x32_avx2: 346.3
vvc_avg_12_128x64_c: 197030.3
vvc_avg_12_128x64_avx2: 2779.6
vvc_avg_12_128x128_c: 432809.6
vvc_avg_12_128x128_avx2: 5513.3
vvc_w_avg_8_2x2_c: 84.3
vvc_w_avg_8_2x2_avx2: 42.6
vvc_w_avg_8_2x4_c: 156.3
vvc_w_avg_8_2x4_avx2: 58.8
vvc_w_avg_8_2x8_c: 310.6
vvc_w_avg_8_2x8_avx2: 73.1
vvc_w_avg_8_2x16_c: 942.1
vvc_w_avg_8_2x16_avx2: 113.3
vvc_w_avg_8_2x32_c: 1098.8
vvc_w_avg_8_2x32_avx2: 202.6
vvc_w_avg_8_2x64_c: 2414.3
vvc_w_avg_8_2x64_avx2: 467.6
vvc_w_avg_8_2x128_c: 4763.8
vvc_w_avg_8_2x128_avx2: 1333.1
vvc_w_avg_8_4x2_c: 140.1
vvc_w_avg_8_4x2_avx2: 49.8
vvc_w_avg_8_4x4_c: 276.3
vvc_w_avg_8_4x4_avx2: 58.1
vvc_w_avg_8_4x8_c: 524.3
vvc_w_avg_8_4x8_avx2: 72.3
vvc_w_avg_8_4x16_c: 1108.1
vvc_w_avg_8_4x16_avx2: 111.8
vvc_w_avg_8_4x32_c: 2149.8
vvc_w_avg_8_4x32_avx2: 199.6
vvc_w_avg_8_4x64_c: 12288.1
vvc_w_avg_8_4x64_avx2: 509.3
vvc_w_avg_8_4x128_c: 8398.6
vvc_w_avg_8_4x128_avx2: 1319.6
vvc_w_avg_8_8x2_c: 271.1
vvc_w_avg_8_8x2_avx2: 44.1
vvc_w_avg_8_8x4_c: 503.3
vvc_w_avg_8_8x4_avx2: 61.8
vvc_w_avg_8_8x8_c: 1031.1
vvc_w_avg_8_8x8_avx2: 93.8
vvc_w_avg_8_8x16_c: 2009.8
vvc_w_avg_8_8x16_avx2: 163.1
vvc_w_avg_8_8x32_c: 4161.3
vvc_w_avg_8_8x32_avx2: 292.1
vvc_w_avg_8_8x64_c: 7940.6
vvc_w_avg_8_8x64_avx2: 592.1
vvc_w_avg_8_8x128_c: 16802.3
vvc_w_avg_8_8x128_avx2: 1287.6
vvc_w_avg_8_16x2_c: 762.6
vvc_w_avg_8_16x2_avx2: 53.6
vvc_w_avg_8_16x4_c: 1486.3
vvc_w_avg_8_16x4_avx2: 67.1
vvc_w_avg_8_16x8_c: 1907.8
vvc_w_avg_8_16x8_avx2: 96.8
vvc_w_avg_8_16x16_c: 3883.6
vvc_w_avg_8_16x16_avx2: 151.3
vvc_w_avg_8_16x32_c: 7974.8
vvc_w_avg_8_16x32_avx2: 285.8
vvc_w_avg_8_16x64_c: 25160.6
vvc_w_avg_8_16x64_avx2: 589.8
vvc_w_avg_8_16x128_c: 58328.1
vvc_w_avg_8_16x128_avx2: 1169.8
vvc_w_avg_8_32x2_c: 1009.1
vvc_w_avg_8_32x2_avx2: 65.6
vvc_w_avg_8_32x4_c: 2091.1
vvc_w_avg_8_32x4_avx2: 96.8
vvc_w_avg_8_32x8_c: 3997.8
vvc_w_avg_8_32x8_avx2: 156.3
vvc_w_avg_8_32x16_c: 8216.8
vvc_w_avg_8_32x16_avx2: 269.6
vvc_w_avg_8_32x32_c: 21746.1
vvc_w_avg_8_32x32_avx2: 635.3
vvc_w_avg_8_32x64_c: 31564.8
vvc_w_avg_8_32x64_avx2: 1010.6
vvc_w_avg_8_32x128_c: 114373.3
vvc_w_avg_8_32x128_avx2: 2013.6
vvc_w_avg_8_64x2_c: 2067.3
vvc_w_avg_8_64x2_avx2: 97.6
vvc_w_avg_8_64x4_c: 3901.1
vvc_w_avg_8_64x4_avx2: 154.8
vvc_w_avg_8_64x8_c: 7911.6
vvc_w_avg_8_64x8_avx2: 268.8
vvc_w_avg_8_64x16_c: 16508.8
vvc_w_avg_8_64x16_avx2: 501.8
vvc_w_avg_8_64x32_c: 38770.3
vvc_w_avg_8_64x32_avx2: 1287.6
vvc_w_avg_8_64x64_c: 110350.6
vvc_w_avg_8_64x64_avx2: 1890.8
vvc_w_avg_8_64x128_c: 141354.6
vvc_w_avg_8_64x128_avx2: 3839.6
vvc_w_avg_8_128x2_c: 7012.1
vvc_w_avg_8_128x2_avx2: 159.3
vvc_w_avg_8_128x4_c: 8146.8
vvc_w_avg_8_128x4_avx2: 272.6
vvc_w_avg_8_128x8_c: 24596.8
vvc_w_avg_8_128x8_avx2: 501.1
vvc_w_avg_8_128x16_c: 35918.1
vvc_w_avg_8_128x16_avx2: 948.8
vvc_w_avg_8_128x32_c: 68799.6
vvc_w_avg_8_128x32_avx2: 1963.1
vvc_w_avg_8_128x64_c: 133862.1
vvc_w_avg_8_128x64_avx2: 3833.6
vvc_w_avg_8_128x128_c: 348427.8
vvc_w_avg_8_128x128_avx2: 7682.8
vvc_w_avg_10_2x2_c: 118.6
vvc_w_avg_10_2x2_avx2: 73.1
vvc_w_avg_10_2x4_c: 189.1
vvc_w_avg_10_2x4_avx2: 89.3
vvc_w_avg_10_2x8_c: 382.8
vvc_w_avg_10_2x8_avx2: 179.8
vvc_w_avg_10_2x16_c: 658.3
vvc_w_avg_10_2x16_avx2: 185.1
vvc_w_avg_10_2x32_c: 1409.3
vvc_w_avg_10_2x32_avx2: 290.8
vvc_w_avg_10_2x64_c: 2906.8
vvc_w_avg_10_2x64_avx2: 793.1
vvc_w_avg_10_2x128_c: 6292.6
vvc_w_avg_10_2x128_avx2: 1696.8
vvc_w_avg_10_4x2_c: 178.8
vvc_w_avg_10_4x2_avx2: 80.1
vvc_w_avg_10_4x4_c: 581.6
vvc_w_avg_10_4x4_avx2: 97.6
vvc_w_avg_10_4x8_c: 693.3
vvc_w_avg_10_4x8_avx2: 128.1
vvc_w_avg_10_4x16_c: 1436.6
vvc_w_avg_10_4x16_avx2: 179.8
vvc_w_avg_10_4x32_c: 2409.1
vvc_w_avg_10_4x32_avx2: 292.3
vvc_w_avg_10_4x64_c: 4925.3
vvc_w_avg_10_4x64_avx2: 746.1
vvc_w_avg_10_4x128_c: 10664.6
vvc_w_avg_10_4x128_avx2: 1647.6
vvc_w_avg_10_8x2_c: 359.3
vvc_w_avg_10_8x2_avx2: 80.1
vvc_w_avg_10_8x4_c: 925.6
vvc_w_avg_10_8x4_avx2: 97.6
vvc_w_avg_10_8x8_c: 1360.6
vvc_w_avg_10_8x8_avx2: 121.8
vvc_w_avg_10_8x16_c: 3490.3
vvc_w_avg_10_8x16_avx2: 203.3
vvc_w_avg_10_8x32_c: 5266.1
vvc_w_avg_10_8x32_avx2: 325.8
vvc_w_avg_10_8x64_c: 11127.1
vvc_w_avg_10_8x64_avx2: 747.8
vvc_w_avg_10_8x128_c: 31058.3
vvc_w_avg_10_8x128_avx2: 1424.6
vvc_w_avg_10_16x2_c: 624.8
vvc_w_avg_10_16x2_avx2: 84.6
vvc_w_avg_10_16x4_c: 1389.6
vvc_w_avg_10_16x4_avx2: 109.1
vvc_w_avg_10_16x8_c: 2688.3
vvc_w_avg_10_16x8_avx2: 137.1
vvc_w_avg_10_16x16_c: 5387.1
vvc_w_avg_10_16x16_avx2: 224.6
vvc_w_avg_10_16x32_c: 10776.3
vvc_w_avg_10_16x32_avx2: 312.1
vvc_w_avg_10_16x64_c: 18069.1
vvc_w_avg_10_16x64_avx2: 858.6
vvc_w_avg_10_16x128_c: 43460.3
vvc_w_avg_10_16x128_avx2: 1411.6
vvc_w_avg_10_32x2_c: 1232.8
vvc_w_avg_10_32x2_avx2: 99.1
vvc_w_avg_10_32x4_c: 4017.6
vvc_w_avg_10_32x4_avx2: 134.1
vvc_w_avg_10_32x8_c: 9306.3
vvc_w_avg_10_32x8_avx2: 208.1
vvc_w_avg_10_32x16_c: 8424.6
vvc_w_avg_10_32x16_avx2: 349.3
vvc_w_avg_10_32x32_c: 20787.8
vvc_w_avg_10_32x32_avx2: 655.3
vvc_w_avg_10_32x64_c: 40972.1
vvc_w_avg_10_32x64_avx2: 904.8
vvc_w_avg_10_32x128_c: 85670.3
vvc_w_avg_10_32x128_avx2: 1751.6
vvc_w_avg_10_64x2_c: 2454.1
vvc_w_avg_10_64x2_avx2: 132.6
vvc_w_avg_10_64x4_c: 5012.6
vvc_w_avg_10_64x4_avx2: 215.6
vvc_w_avg_10_64x8_c: 10811.3
vvc_w_avg_10_64x8_avx2: 361.1
vvc_w_avg_10_64x16_c: 33349.1
vvc_w_avg_10_64x16_avx2: 904.1
vvc_w_avg_10_64x32_c: 41892.3
vvc_w_avg_10_64x32_avx2: 1220.6
vvc_w_avg_10_64x64_c: 66983.3
vvc_w_avg_10_64x64_avx2: 2622.1
vvc_w_avg_10_64x128_c: 246508.8
vvc_w_avg_10_64x128_avx2: 3316.8
vvc_w_avg_10_128x2_c: 7791.6
vvc_w_avg_10_128x2_avx2: 198.8
vvc_w_avg_10_128x4_c: 10534.3
vvc_w_avg_10_128x4_avx2: 337.3
vvc_w_avg_10_128x8_c: 21142.3
vvc_w_avg_10_128x8_avx2: 614.8
vvc_w_avg_10_128x16_c: 40968.6
vvc_w_avg_10_128x16_avx2: 1160.6
vvc_w_avg_10_128x32_c: 113043.3
vvc_w_avg_10_128x32_avx2: 1644.6
vvc_w_avg_10_128x64_c: 230658.3
vvc_w_avg_10_128x64_avx2: 5065.3
vvc_w_avg_10_128x128_c: 335236.3
vvc_w_avg_10_128x128_avx2: 6450.3
vvc_w_avg_12_2x2_c: 185.3
vvc_w_avg_12_2x2_avx2: 43.6
vvc_w_avg_12_2x4_c: 340.3
vvc_w_avg_12_2x4_avx2: 55.8
vvc_w_avg_12_2x8_c: 632.3
vvc_w_avg_12_2x8_avx2: 70.1
vvc_w_avg_12_2x16_c: 728.3
vvc_w_avg_12_2x16_avx2: 108.1
vvc_w_avg_12_2x32_c: 1392.6
vvc_w_avg_12_2x32_avx2: 176.8
vvc_w_avg_12_2x64_c: 2618.3
vvc_w_avg_12_2x64_avx2: 757.3
vvc_w_avg_12_2x128_c: 6408.8
vvc_w_avg_12_2x128_avx2: 1435.1
vvc_w_avg_12_4x2_c: 349.3
vvc_w_avg_12_4x2_avx2: 44.3
vvc_w_avg_12_4x4_c: 607.1
vvc_w_avg_12_4x4_avx2: 52.6
vvc_w_avg_12_4x8_c: 1134.8
vvc_w_avg_12_4x8_avx2: 70.1
vvc_w_avg_12_4x16_c: 1378.1
vvc_w_avg_12_4x16_avx2: 115.3
vvc_w_avg_12_4x32_c: 2599.3
vvc_w_avg_12_4x32_avx2: 174.3
vvc_w_avg_12_4x64_c: 4474.8
vvc_w_avg_12_4x64_avx2: 656.1
vvc_w_avg_12_4x128_c: 11319.6
vvc_w_avg_12_4x128_avx2: 1373.1
vvc_w_avg_12_8x2_c: 595.8
vvc_w_avg_12_8x2_avx2: 44.3
vvc_w_avg_12_8x4_c: 1164.3
vvc_w_avg_12_8x4_avx2: 56.6
vvc_w_avg_12_8x8_c: 2019.6
vvc_w_avg_12_8x8_avx2: 80.1
vvc_w_avg_12_8x16_c: 4071.6
vvc_w_avg_12_8x16_avx2: 139.3
vvc_w_avg_12_8x32_c: 4485.1
vvc_w_avg_12_8x32_avx2: 250.6
vvc_w_avg_12_8x64_c: 8404.8
vvc_w_avg_12_8x64_avx2: 735.8
vvc_w_avg_12_8x128_c: 35679.8
vvc_w_avg_12_8x128_avx2: 1252.6
vvc_w_avg_12_16x2_c: 1114.8
vvc_w_avg_12_16x2_avx2: 46.6
vvc_w_avg_12_16x4_c: 2240.1
vvc_w_avg_12_16x4_avx2: 62.6
vvc_w_avg_12_16x8_c: 13174.6
vvc_w_avg_12_16x8_avx2: 88.6
vvc_w_avg_12_16x16_c: 5334.6
vvc_w_avg_12_16x16_avx2: 144.3
vvc_w_avg_12_16x32_c: 8378.1
vvc_w_avg_12_16x32_avx2: 234.6
vvc_w_avg_12_16x64_c: 21300.8
vvc_w_avg_12_16x64_avx2: 761.8
vvc_w_avg_12_16x128_c: 32786.8
vvc_w_avg_12_16x128_avx2: 1432.8
vvc_w_avg_12_32x2_c: 2154.3
vvc_w_avg_12_32x2_avx2: 61.1
vvc_w_avg_12_32x4_c: 4299.8
vvc_w_avg_12_32x4_avx2: 83.1
vvc_w_avg_12_32x8_c: 7964.8
vvc_w_avg_12_32x8_avx2: 132.6
vvc_w_avg_12_32x16_c: 13321.6
vvc_w_avg_12_32x16_avx2: 234.6
vvc_w_avg_12_32x32_c: 21149.3
vvc_w_avg_12_32x32_avx2: 433.3
vvc_w_avg_12_32x64_c: 43666.6
vvc_w_avg_12_32x64_avx2: 876.6
vvc_w_avg_12_32x128_c: 83189.8
vvc_w_avg_12_32x128_avx2: 1756.6
vvc_w_avg_12_64x2_c: 3829.8
vvc_w_avg_12_64x2_avx2: 83.1
vvc_w_avg_12_64x4_c: 8588.1
vvc_w_avg_12_64x4_avx2: 127.1
vvc_w_avg_12_64x8_c: 17027.6
vvc_w_avg_12_64x8_avx2: 310.6
vvc_w_avg_12_64x16_c: 29797.8
vvc_w_avg_12_64x16_avx2: 415.6
vvc_w_avg_12_64x32_c: 43854.3
vvc_w_avg_12_64x32_avx2: 773.3
vvc_w_avg_12_64x64_c: 137767.3
vvc_w_avg_12_64x64_avx2: 1608.6
vvc_w_avg_12_64x128_c: 316428.3
vvc_w_avg_12_64x128_avx2: 3249.8
vvc_w_avg_12_128x2_c: 8824.6
vvc_w_avg_12_128x2_avx2: 130.3
vvc_w_avg_12_128x4_c: 17173.6
vvc_w_avg_12_128x4_avx2: 219.3
vvc_w_avg_12_128x8_c: 21997.8
vvc_w_avg_12_128x8_avx2: 397.3
vvc_w_avg_12_128x16_c: 43553.8
vvc_w_avg_12_128x16_avx2: 790.1
vvc_w_avg_12_128x32_c: 89792.1
vvc_w_avg_12_128x32_avx2: 1497.6
vvc_w_avg_12_128x64_c: 226573.3
vvc_w_avg_12_128x64_avx2: 3153.1
vvc_w_avg_12_128x128_c: 332090.1
vvc_w_avg_12_128x128_avx2: 6499.6
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
1 year ago
Wu Jianhua
70889620f2
avcodec/vvcdec: reuse h26x/2656_inter.asm to enable x86 optimizations
...
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
1 year ago
Wu Jianhua
fc5ff6b0b8
avcodec/x86/h26x/h2656_inter: add dststride to put
...
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
1 year ago
Wu Jianhua
7d9f1f5485
avcodec/x86/hevc_mc: move put/put_uni to h26x/h2656_inter.asm
...
This enable that the asm optimization can be reused by VVC
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
1 year ago
James Almer
4fee63b241
x86/takdsp: add missing wrappers to AVX2 functions
...
Fixes compilation with old yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
591dc3b4b8
x86/takdsp: add avx2 versions of all functions
...
On an Intel Core i7 12700k:
decorrelate_ls_c: 814.3
decorrelate_ls_sse2: 165.8
decorrelate_ls_avx2: 101.3
decorrelate_sf_c: 1602.6
decorrelate_sf_sse4: 640.1
decorrelate_sf_avx2: 324.6
decorrelate_sm_c: 1564.8
decorrelate_sm_sse2: 379.3
decorrelate_sm_avx2: 203.3
decorrelate_sr_c: 785.3
decorrelate_sr_sse2: 176.3
decorrelate_sr_avx2: 99.8
Tested-by: Lynne <dev@lynne.ee>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
46775e64f8
avcodec/takdsp: fix const correctness
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
34d56e1766
x86/aacencdsp: clear the high bits for size in ff_abs_pow34_sse
...
Fixes checkasm failures on win64.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
e40ea9f34b
x86/ac3dsp: add ff_float_to_fixed24_avx()
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
d8b1a34433
x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop
...
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
567c67c6c8
avcodec/ac3dsp: make len a size_t in float_to_fixed24
...
Should simplify asm implementations, and prevent UB on at least win64.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
2d9fd814d0
x86/: clear the high bits for order in scalarproduct_and_madd functions
...
Should fix checkasm failures on win64.
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
James Almer
78f55457c9
x86/flacds: clear the high bits from pred_order in lpc_32 functions
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
1 year ago
Andreas Rheinhardt
f52b4a6e69
avcodec/snow: Move dsp helper functions to snow_dwt.h
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
6f7bf64dbc
avcodec: Remove DCT, FFT, MDCT and RDFT
...
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.
Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.
Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
d9464f3e34
avcodec/mpegaudiodsp: Init dct32 directly
...
This avoids using dct.c and will allow removing it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
27562bf022
avcodec/x86/mpegvideoenc_template: Disable dead code
...
Since bfb28b5ce8
the permutation
type FF_IDCT_PERM_SIMPLE is ARCH_X86_32-only. So use this
knowledge to disable code for it when not on ARCH_X86_32.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
7b0b9a25ed
avcodec/proresdsp: Pass necessary parameter directly
...
Only avctx->bits_per_raw_sample is used.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Andreas Rheinhardt
947d51f32a
avcodec/x86/hpeldsp_vp3: Merge into hpeldsp
...
Once upon a time, 413abbe164
added versions of some put_no_rnd_pixels functions for use
in VP3 and Theora (with an explicit check so that they are
only used for VP3 and Theora). When this was moved to hpeldsp
(from dsputil) in 3ced55d51c
,
the check was replaced by a check for the bitexact flag
(and a CONFIG_VP3_DECODER compile-time check), so that
these functions were now used for other codecs as well.
Later commit 1dfc3cf89d
split off the "VP3-specific bits into a separate file",
yet these bits were not really VP3-specific bits at all
any more. (The error was repeated in commit
0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553.) This commit
has not been reverted, because this would make future
changes from Libav (from where it originated) harder,
yet Libav is no more, so this commit effectively reverts
1dfc3cf89d
.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
1 year ago
Rémi Denis-Courmont
effadce6c7
avcodec/x86/mathops: clip constants used with shift instructions within inline assembly
...
Fixes assembling with binutil as >= 2.41
Signed-off-by: James Almer <jamrial@gmail.com>
2 years ago
Christopher Degawa
182663a58a
get_cabac_inline_x86: Don't inline the assembly function on 32 bit
...
While the inline cabac assembly has worked correctly in i386 builds
historically, modern compiler updates has started showing issues
with it, when the function gets inlined into larger contexts that
fail to provide the amount of free registers as this function
requires.
This was an issue with Clang on Windows on i386, which was fixed
in c6d284b945324a7bc70ea8b9056040c8148aa835. However, recently
the same issues also have started showing up with GCC (both for
Windows and Linux). Whether the issue appears seems dependent on
a lot of optimizer tuning (e.g. the issue appears or goes away
depenent on the combinaton of -march= and -mtune= options),
potentially due to the compiler making different decisions on
how much to inline.
Fixes: https://trac.ffmpeg.org/ticket/8903
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
...
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2 years ago
James Darnley
6af453ca38
avcodec/x86: add avx512icl function for v210dec
...
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
2 years ago
James Darnley
f30b4c2f47
avcodec/x86/v210: add some comments to the improved avx2 function
2 years ago
Andreas Rheinhardt
262e7439c6
avcodec/x86/Makefile: Don't build empty files
...
simple_idct.asm is 32 bit-only since
bfb28b5ce8
,
whereas simple_idct10.asm is x64-only. So don't build
the ultimately unneeded and empty files, as some linkers
complain about this: "ranlib: file:
libavcodec/libavcodec.a(simple_idct.o) has no symbols"
(this is from an Xcode toolchain as reported by Ronald S. Bultje).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2 years ago
James Darnley
5dfb4f9690
avcodec/x86/v210enc: change '0b' binary constant prefix to 'b' suffix
...
For compatability with yasm from 0.7.0
2 years ago
James Darnley
690b7890f0
avcodec/x86/v210enc: remove unneeded instruction
2 years ago
James Darnley
c67a2b14a2
avcodec/x86/v210enc: expand and correct comments
2 years ago
James Darnley
651cb867b1
avcodec/v210enc: add new 10-bit function for avx512 avx512icl
...
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2
avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
2 years ago
James Darnley
bda53d2dde
avcodec/x86/v210enc: replace register use with named register
2 years ago
Andreas Rheinhardt
4228f8ad49
avcodec/x86/cavsdsp: Remove unused 3DNow-macro
...
Forgotten in 3221aba879
.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2 years ago
Lynne
b85e106d5f
libavcodec: remove mdct15
...
It's not needed nor used by anything anymore, lavu/tx is faster,
and better in every way. RIP.
2 years ago
Lynne
e0661fc805
dca_core: convert to lavu/tx
...
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2 years ago
James Darnley
c3d36e1b3d
avcodec/v210enc: add new function for avx2 avx512 avx512icl
...
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
1690±4.3 decicycles vs. 1693±78.4
1439±31.1 decicycles vs 1429±16.7
Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2
Better speedup with avx512icl on Ice Lake (Xeon Silver 4316):
1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2
Co-authors:
Henrik Gramner <henrik@gramner.com>
Kieran Kunhya <kierank@obe.tv>
2 years ago
Andreas Rheinhardt
4209216ee8
avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 only
...
It is only used by gmc/gmc1 which is only used by the MPEG-4
decoder, so move it to Mpeg4DecContext and rename it
to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2 years ago
Andreas Rheinhardt
e84348a8ab
avcodec/svq1enc: Add SVQ1EncDSPContext, make codec context private
...
Currently, SVQ1EncContext is defined in a header that is also
included by the arch-specific code that initializes the one
and only dsp function that this encoder uses directly.
But the arch-specific functions to set this dsp function
do not need anything from SVQ1EncContext. This commit therefore
adds a small SVQ1EncDSPContext whose only member is said
function pointer and renames svq1enc.h to svq1encdsp.h
to avoid exposing unnecessary internals to these init
functions (and the whole mpegvideo with it).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2 years ago
Carl Eugen Hoyos
60e87faf7f
lavc/x86/simple_idct: Fix linking shared libavcodec with MS link.exe
...
link.exe hangs on empty simple_idct.o
Fixes ticket #9909 .
2 years ago
Andreas Rheinhardt
1741adb1c7
avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dsp
...
It is the only thing that is actually used.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2 years ago