FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Vignali	b37196adff	avutil/x86util : add macro for loading a 128 bits constants in an xmm or in each part of an ymm in order to simplify avx2 asm func	7 years ago
Ivan Kalvachev	30ae07d7ef	Add macros to x86util.asm . Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>	7 years ago
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	8 years ago
James Almer	d8962ffbd8	avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
Diego Biurrun	994c4bc107	x86util: Port all macros to cpuflags Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.	8 years ago
James Darnley	5336887867	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)	8 years ago
James Darnley	7627df15d4	x86util: import MOVHL macro Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.	8 years ago
James Darnley	9d815b7424	avcodec/x86: deduplicate PASS8ROWS macro	8 years ago
Alexandra Hájková	07e1f99a1b	x86util: Document SBUTTERFLY macro Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	8 years ago
James Almer	fd5e6a095f	x86util: Extend SPLATW for avx2 Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>	8 years ago
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	8 years ago
James Almer	172af20852	x86/showcqt: use three operand format for some instructions Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	99b899483e	avutil/x86util: move haddps sse emulation from showcqt Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	d5f8a642f6	x86: port PSIGNW to cpuflags Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	5750d6c5e9	x86: move XOP emulation code back to x86inc Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
Kieran Kunhya	9a738c27dc	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	10 years ago
Kieran Kunhya	36091742d1	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
James Almer	d0f56ca071	x86/hevc_deblock: improve 8bit transpose store macros Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
James Almer	1ace9573dc	x86/hevc_idct: replace old and unused idct functions Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Christophe Gisquet	9107612818	x86util: add and use RSHIFT/LSHIFT macros Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Christophe Gisquet	2267003981	x86: hpeldsp: better factorization Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	561bfc85eb	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	76ed71a72b	x86: move horizontal add macros to x86util Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	3f3d748cab	x86: Move XOP emulation to x86util We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Jason Garrett-Glaser	c6908d6b4b	x86inc: FMA3/4 Support Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Derek Buitenhuis	206895708e	x86inc: Remove our FMA4 support This is so we can sync to x264's version of FMA4 support. This partialy reverts commit `79687079a9`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Diego Biurrun	d633d12b2c	x86inc: Add cvisible macro for C functions with public prefix This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	ef5d41a553	x86inc: Rename "program_name" to "private_prefix" The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	dae1d507af	x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags	12 years ago
Diego Biurrun	320e1d0df3	x86: ABSB2: port to cpuflags	12 years ago
Diego Biurrun	094a7405e5	x86: ABSB: port to cpuflags	12 years ago
Diego Biurrun	51969a652c	x86: ABS2: port to cpuflags	12 years ago
Diego Biurrun	5b4dfbffc2	x86: ABS1: port to cpuflags	12 years ago
Justin Ruggles	ac7eb4cb20	float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Include x86-optimized versions for SSE2 and AVX.	12 years ago
Diego Biurrun	87af05c575	x86: SPLATD: port to cpuflags	12 years ago
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	12 years ago
Diego Biurrun	f0d124f005	x86inc: Set program_name outside of x86inc.asm This reduces the local difference to the x264 upstream version.	12 years ago
Diego Biurrun	4b60fac419	x86: PALIGNR: port to cpuflags	12 years ago
Diego Biurrun	dbb37e7711	x86: PABSW: port to cpuflags	12 years ago
Diego Biurrun	0a7a94f2e5	x86: Refactor PSWAPD fallback implementations and port to cpuflags	12 years ago
Diego Biurrun	26f01bd106	x86: PMINUB: port to cpuflags	12 years ago
Diego Biurrun	61bc2bc7d4	x86util: Add cpuflags_mmxext alias for cpuflags_mmx2 "mmxext" is a more sensible name and more common in outside projects.	12 years ago
Dave Yeo	264f12342c	x86: Fix assembly with NASM Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Dave Yeo	9c167914a1	x86: Fix assembly with NASM Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	588fafe7f3	x86: MMX2 ---> MMXEXT in macro names	12 years ago
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	12 years ago
Justin Ruggles	6092dafb5a	lavr: x86: optimized 6-channel s16 to fltp conversion	12 years ago
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	12 years ago
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	13 years ago

1 2

89 Commits (0a5ff1964355f6d288071b7c0bc4fb24f658c9fc)