FFmpeg

Commit Graph

Author	SHA1	Message	Date
James Almer	d950279cbf	avcodec/ttadsp: cosmetics Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	efc9d5c4bc	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Ronald S. Bultje	a4edaa0270	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	9 years ago
Ronald S. Bultje	7ca422bb1b	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	9 years ago
Ronald S. Bultje	726501a34e	vp9: add 32x32 idct AVX2 implementation. About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0	9 years ago
James Almer	7a15cf42ee	x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32 Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Rostislav Pehlivanov	df1dc52195	diracdsp_init: add missing ARCH_X86_64 check That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	9 years ago
Rostislav Pehlivanov	bd61f3c6bf	diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	9 years ago
Rostislav Pehlivanov	80721cc1ff	diracdsp: add dequantization SIMD Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	9 years ago
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	9 years ago
James Almer	645489cf90	x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32 About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	293484fa5e	avcodec: add missing xmm/neon clobber test wrappers for the new decode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Matthieu Bouron	9eb3da2f99	asm: FF_-prefix internal macros used in inline assembly See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.	9 years ago
Anton Khirnov	9df889a5f1	h264: rename h264.[ch] to h264dec.[ch] This is more consistent with the naming of other decoders.	9 years ago
Martin Storsjö	f1a9eee41c	x86: Add missing movsxd for the int stride parameter Signed-off-by: Martin Storsjö <martin@martin.st>	9 years ago
James Almer	ede4ec1f8f	x86/aacpsdsp: optimize add_squares loop Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	82dbfccaf0	x86/aacdec: use HADDPS macro Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Diego Biurrun	1e9c5bf4c1	asm: FF_-prefix internal macros used in inline assembly These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.	9 years ago
Diego Biurrun	dc40a70c57	Drop unnecessary libavutil/x86/asm.h #includes	9 years ago
Diego Biurrun	a6a750c7ef	tests: Move all test programs to a subdirectory	9 years ago
Christophe Gisquet	9630b3fc06	x86: lossless audio: SSE4 madd 32bits The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	9 years ago
Diego Biurrun	01621202aa	build: miscellaneous cosmetics Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.	9 years ago
Michael Niedermayer	305344d89e	avcodec/fft: Add revtab32 for FFTs with more than 65536 samples x86 optimizations are used only for the cases they support (<=65536 samples) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Michael Niedermayer	ae76b84221	avcodec: Extend fft to size 2^17 Asked-for-by: durandal_1707 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Diego Biurrun	1a094af638	fft: Split MDCT bits off from FFT	9 years ago
Timothy Gu	e3461197b1	x86/vc1dsp: Split the file into MC and loopfilter	9 years ago
Diego Biurrun	73ff983e8d	fft: x86: cosmetics: Drop silly comments, add comment, whitespace	9 years ago
Diego Biurrun	257b30af8e	x86: hevc: Fix linking with both yasm and optimizations disabled Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.	9 years ago
James Almer	45d3af9059	x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Diego Biurrun	15a24614ae	build: Add vc1dsp component for more fine-grained dependencies	9 years ago
James Almer	70d685a77f	x86: use the new helper macros where useful Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Timothy Gu	bcc223523e	x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>	9 years ago
Timothy Gu	59ebf32bca	huffyuvencdsp: Undefine "i" macro after each use	9 years ago
James Almer	8ae7447941	x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3} Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Timothy Gu	9fd6ea933f	dirac_dwt: Make x86 files/functions names consistent	9 years ago
Timothy Gu	17ab8f7e68	diracdsp: Make x86 files/functions names consistent	9 years ago
Henrik Gramner	aa751573fe	avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc Using rNm and x86inc's stack allocation with a negative value at the same time isn't supported, and caused the original stack pointer to be clobbered when using a compiler that doesn't support stack alignment.	9 years ago
James Darnley	7042a55c55	avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter 2.6 times faster (366 vs. 142 cycles)	9 years ago
Timothy Gu	dd57b316c1	diracdsp_mmx: Fix some more indentations	9 years ago
Timothy Gu	f5e2b8de55	diracdsp_mmx: Fix indentation	9 years ago
Timothy Gu	838abfc1d7	x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format	9 years ago
Luca Barbato	e280fe1329	v210: Use separate sample_factors The 10bit and the 8bit functions can now be implemented to process a different amount of samples. And while at it simplify a little the code.	9 years ago
James Darnley	15ec7aa417	v210: Add avx2 version of the 10-bit line encoder Around 25% faster than the ssse3 version. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	9 years ago
James Darnley	d29237e557	v210: Add avx2 version of the 8-bit line encoder Around 35% faster than the avx version. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	9 years ago
Timothy Gu	180f9a0958	all: Make header guard names consistent	9 years ago
foo86	ae5b2c5250	avcodec/dca: add new decoder based on libdcadec	9 years ago
foo86	4608996772	avcodec/dca: remove old decoder Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.	9 years ago
James Almer	c792528970	x86/imdct36: use extractps inside the STORE macro Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Luca Barbato	eafb05fcf3	v210: x86: Add the correct guards around the asm code Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	9 years ago

1 2 3 4 5 ...

2156 Commits (325e56479ff64c884f3bcccf922a7f7163488b89)