FFmpeg

Commit Graph

Author	SHA1	Message	Date
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	9 years ago
James Almer	172af20852	x86/showcqt: use three operand format for some instructions Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	99b899483e	avutil/x86util: move haddps sse emulation from showcqt Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	d5f8a642f6	x86: port PSIGNW to cpuflags Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	5750d6c5e9	x86: move XOP emulation code back to x86inc Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
Kieran Kunhya	9a738c27dc	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	10 years ago
Kieran Kunhya	36091742d1	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
James Almer	d0f56ca071	x86/hevc_deblock: improve 8bit transpose store macros Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	1ace9573dc	x86/hevc_idct: replace old and unused idct functions Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Christophe Gisquet	9107612818	x86util: add and use RSHIFT/LSHIFT macros Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Christophe Gisquet	2267003981	x86: hpeldsp: better factorization Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	561bfc85eb	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	76ed71a72b	x86: move horizontal add macros to x86util Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	3f3d748cab	x86: Move XOP emulation to x86util We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Jason Garrett-Glaser	c6908d6b4b	x86inc: FMA3/4 Support Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Derek Buitenhuis	206895708e	x86inc: Remove our FMA4 support This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Diego Biurrun	d633d12b2c	x86inc: Add cvisible macro for C functions with public prefix This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	ef5d41a553	x86inc: Rename "program_name" to "private_prefix" The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	dae1d507af	x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags	12 years ago
Diego Biurrun	320e1d0df3	x86: ABSB2: port to cpuflags	12 years ago
Diego Biurrun	094a7405e5	x86: ABSB: port to cpuflags	12 years ago
Diego Biurrun	51969a652c	x86: ABS2: port to cpuflags	12 years ago
Diego Biurrun	5b4dfbffc2	x86: ABS1: port to cpuflags	12 years ago
Justin Ruggles	ac7eb4cb20	float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Include x86-optimized versions for SSE2 and AVX.	12 years ago
Diego Biurrun	87af05c575	x86: SPLATD: port to cpuflags	12 years ago
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	12 years ago
Diego Biurrun	f0d124f005	x86inc: Set program_name outside of x86inc.asm This reduces the local difference to the x264 upstream version.	12 years ago
Diego Biurrun	4b60fac419	x86: PALIGNR: port to cpuflags	12 years ago
Diego Biurrun	dbb37e7711	x86: PABSW: port to cpuflags	12 years ago
Diego Biurrun	0a7a94f2e5	x86: Refactor PSWAPD fallback implementations and port to cpuflags	12 years ago
Diego Biurrun	26f01bd106	x86: PMINUB: port to cpuflags	12 years ago
Diego Biurrun	61bc2bc7d4	x86util: Add cpuflags_mmxext alias for cpuflags_mmx2 "mmxext" is a more sensible name and more common in outside projects.	12 years ago
Dave Yeo	264f12342c	x86: Fix assembly with NASM Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Dave Yeo	9c167914a1	x86: Fix assembly with NASM Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	588fafe7f3	x86: MMX2 ---> MMXEXT in macro names	12 years ago
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	12 years ago
Justin Ruggles	6092dafb5a	lavr: x86: optimized 6-channel s16 to fltp conversion	13 years ago
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	13 years ago
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	13 years ago
Vitor Sessak	4a301706fd	x86: Avoid movs on BUTTERFLYPS when in AVX mode Signed-off-by: Janne Grunau <janne-libav@jannau.net>	13 years ago
Justin Ruggles	5cc6d5244d	lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX The current SSE version is slower than the MMX version on Athlon64 and Sandy Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.	13 years ago
Justin Ruggles	c8af852b97	Add libavresample This is a new library for audio sample format, channel layout, and sample rate conversion.	13 years ago
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	13 years ago
Justin Ruggles	4e8e262476	fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm	13 years ago
Ronald S. Bultje	38e06c2969	Move clipd macros to x86util.asm. This allows sharing them between multiple .asm files.	14 years ago
Ronald S. Bultje	b2c087871d	Move x86util.asm from libavcodec/ to libavutil/. This allows using it in swscale also.	14 years ago
Jason Garrett-Glaser	a3bf7b864a	H.264: tweak some other x86 asm for Atom	14 years ago
Daniel Kang	c0483d0c7a	H.264: Add x86 assembly for 10-bit H.264 predict functions Mainly ported from 8-bit H.264 predict. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Loren Merritt	422b2362fc	dct32_sse: eliminate some spills 125->104 cycles on penryn (x86_64 only)	14 years ago

1 2

76 Commits (8431a6e654e5c4e2b80826240d4d50774212b309)