FFmpeg

Commit Graph

Author	SHA1	Message	Date
Ronald S. Bultje	f8c019944d	vp9: re-split the decoder/format/dsp interface header files. The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.	8 years ago
Clément Bœsch	1c9f4b5078	lavc/vp9: split into vp9{block,data,mvs} This is following Libav layout to ease merges.	8 years ago
Ronald S. Bultje	83a139e3d8	vp9: add avx2 iadst16 implementations. Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).	8 years ago
Diego Biurrun	3cba09e522	x86: Drop stray semicolons after function definitions libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]	8 years ago
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Ronald S. Bultje	715f139c9b	vp9lpf/x86: make filter_16_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	8915320db9	vp9lpf/x86: make filter_48/84/88_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	725a216481	vp9lpf/x86: make filter_44_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	5bfa96c4b3	vp9lpf/x86: make filter_16_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	b905e8d2fe	vp9lpf/x86: make filter_48/84_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	37637e6590	vp9lpf/x86: make filter_88_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	be10834bd9	vp9lpf/x86: make filter_44_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Clément Bœsch	0ed21bdc9e	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Clément Bœsch	f2e3d706a1	vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
James Almer	92d47550ea	vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16 Similar gains as the ssse3 version once again Additional improvements by Clément Bœsch <u@pkh.me>. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Clément Bœsch	6bea478158	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
James Almer	1f451eed60	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Clément Bœsch	a692724c58	vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	9790b44a89	vp9mc/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
James Almer	67922b4ee4	vp9mc/x86: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Clément Bœsch	3cda179f18	vp9mc/x86: rename ff_* to ff_vp9_* Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
James Almer	8be8444d01	vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	3a09494939	vp9mc/x86: add 16px functions (64bit only). Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Ronald S. Bultje	a4edaa0270	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	8 years ago
Ronald S. Bultje	7ca422bb1b	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	8 years ago
Ronald S. Bultje	726501a34e	vp9: add 32x32 idct AVX2 implementation. About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0	8 years ago
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	8 years ago
Diego Biurrun	dc40a70c57	Drop unnecessary libavutil/x86/asm.h #includes	9 years ago
James Almer	70d685a77f	x86: use the new helper macros where useful Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Ganesh Ajjanagadde	38f4e973ef	all: fix -Wextra-semi reported on clang This fixes extra semicolons that clang 3.7 on GNU/Linux warns about. These were trigggered when built under -Wpedantic, which essentially checks for strict ISO compliance in numerous ways. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>	9 years ago
Ronald S. Bultje	1c3be32533	vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.	9 years ago
Ronald S. Bultje	344d519040	vp9: add subpel MC SIMD for 10/12bpp.	9 years ago
Ronald S. Bultje	77f359670f	vp9: add fullpel (avg) MC SIMD for 10/12bpp.	9 years ago
Ronald S. Bultje	6354ff0383	vp9: add fullpel (put) MC SIMD for 10/12bpp.	9 years ago
Ronald S. Bultje	fd8b90f5f6	vp9: fix overflow in 8x8 topleft 32x32 idct ssse3 version. Also disable the mmx/iwht optimization when the bitexact flag is set. With synthetically coded coefficients (i.e. these that lead to a residual well outside the [-255,255] range), our optimizations will overflow. It doesn't make sense to fix the overflows, since they can only occur on synthetic input, not on real fwht-generated input. Thus, add a bitexact flag that disables this optimization.	9 years ago
James Almer	c16e99e3b3	x86: check for AV_CPU_FLAG_AVXSLOW where useful Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	cc77bb09e4	avcodec/x86/vp9dsp_init: Fix mix of declaration and statement Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	b224b165cb	vp9: add keyframe profile 2/3 support.	10 years ago
Ronald S. Bultje	afd8c464b7	vp9/x86: make filter_16_h work on 32-bit.	10 years ago
Ronald S. Bultje	b26bc3520f	vp9/x86: make filter_48/84/88_h work on 32-bit.	10 years ago
Ronald S. Bultje	8a1cff1c35	vp9/x86: make filter_44_h work on 32-bit.	10 years ago
Ronald S. Bultje	047088b8c6	vp9/x86: make filter_16_v work on 32-bit.	10 years ago
Ronald S. Bultje	0cc9c23ea1	vp9/x86: make filter_48/84_v work on 32-bit.	10 years ago
Ronald S. Bultje	6433a9133f	vp9/x86: make filter_88_v work on 32-bit.	10 years ago
Ronald S. Bultje	75f8e52089	vp9/x86: make filter_44_v work on 32-bit.	10 years ago
James Almer	32c836cb11	x86/vp9: remove duplicate function prototypes Fixes "redundant redeclaration" warnings. Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
Ronald S. Bultje	bdc1e3e3b2	vp9/x86: intra prediction sse2/32bit support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	cae893f692	vp9/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	fd77fbb390	vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
James Almer	6b2caa321f	x86/vp9: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago

1 2

54 Commits (23538ad2eb76a0d27a1f2b2bcdccd857124a0224)