FFmpeg

Commit Graph

Author	SHA1	Message	Date
James Almer	23a8c63452	x86inc: Extend FMA_INSTR functionality Support the cases where the first and last operand of the XOP instruction are the same. Also add vpmacsdql emulation. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Loren Merritt	b7d0d10a1d	x86inc: Speed up assembling with Yasm Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>	11 years ago
Loren Merritt	4d55fe7204	x86inc: speed up compilation with yasm Work around yasm's inefficiency with handling large numbers of variables in the global scope.	11 years ago
Jason Garrett-Glaser	a3fabc6cb3	x86: more AVX2 framework Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Jason Garrett-Glaser	c6908d6b4b	x86inc: FMA3/4 Support Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Derek Buitenhuis	206895708e	x86inc: Remove our FMA4 support This is so we can sync to x264's version of FMA4 support. This partialy reverts commit `79687079a9`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	c108ba0175	x86inc: Use VEX-encoded instructions in AVX functions Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	ad7d7d4f6a	x86inc: Remove .rodata kludges The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	3e2fa991db	x86inc: remove misaligned cpu flag Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Jason Garrett-Glaser	7115566541	x86inc: various minor backports from x264 Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Derek Buitenhuis	47f9d7ce54	x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64" This is also a valid value for WIN64. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	bbe4a6db44	x86inc: Utilize the shadow space on 64-bit Windows Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Loren Merritt	3fb78e99a0	x86inc: create xm# and ym#, analagous to m# For when we want to mix simd sizes within one function. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Loren Merritt	49ebe3f9fe	x86inc: fix some corner cases of SWAP SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	63f0d62310	x86inc: Use SSE instead of SSE2 for copying data Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Henrik Gramner	ad76e6e7e1	x86inc: Set ELF hidden visibility for global constants Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Loren Merritt	25cb0c1a1e	x86inc: activate REP_RET automatically Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's OK because we strip them. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	11 years ago
Ronald S. Bultje	c07ac8d467	VP9 MC (ssse3) optimizations. Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.	11 years ago
Christophe Gisquet	2e81acc687	x86inc: Fix number of operands for cmp* instructions cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Christophe Gisquet	0b467a6e83	x264asm: fix cmp* number of arguments cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Ronald S. Bultje	0c0828ecc5	x86: Use simple nop codes for <= sse (rather than <= mmx) The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes. Signed-off-by: Martin Storsjö <martin@martin.st>	12 years ago
Ronald S. Bultje	b582af1ed7	Use simple nop codes for <= sse (rather than <= mmx). The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes. Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106	12 years ago
Diego Biurrun	d633d12b2c	x86inc: Add cvisible macro for C functions with public prefix This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Diego Biurrun	ef5d41a553	x86inc: Rename "program_name" to "private_prefix" The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	12 years ago
Ronald S. Bultje	a34d9ad969	lavc: merge latest x86inc.asm fixes with x264 Unbreak NASM support. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	12 years ago
Janne Grunau	0995ad8db4	x86inc: fully concatenate tokens to fix macro expansion for nasm Fixes build errors with nasm introduced in `6f40e9f070` for stack memory alignment. Noticed by BugMaster.	12 years ago
Ronald S. Bultje	140367aff9	x86inc: fix stack alignment on win64 Signed-off-by: Martin Storsjö <martin@martin.st>	12 years ago
Ronald S. Bultje	ce58642ed0	x86inc: support stack mem allocation and re-alignment in PROLOGUE. Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Ronald S. Bultje	6f40e9f070	x86inc: support stack mem allocation and re-alignment in PROLOGUE Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	12 years ago
Justin Ruggles	b30a363331	x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling	12 years ago
Diego Biurrun	f0d124f005	x86inc: Set program_name outside of x86inc.asm This reduces the local difference to the x264 upstream version.	12 years ago
Diego Biurrun	012f73e271	x86inc: Only define program_name if the macro is unset This allows overriding the value from outside of the file.	12 years ago
Ronald S. Bultje	08b028c18d	Remove INIT_AVX from x86inc.asm.	12 years ago
Loren Merritt	7a1944b907	vf_hqdn3d: x86 asm 13% faster on penryn, 16% on sandybridge, 15% on bulldozer Not simd; a compiler should have generated this, but gcc didn't.	13 years ago
Michael Niedermayer	c794acc44e	x86inc.asm: remove redundant ifdef __YASM_VER__ Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Mans Rullgard	edd8226795	x86: fix build with nasm 2.08 It appears that something goes wrong in old nasm versions when the %+ operator is used in the last argument of a macro invocation and this argument is tested with %ifdef within the macro. This patch rearranges the macro arguments such that the %+ operator is never used in the last argument.	13 years ago
Mans Rullgard	180d43bc67	x86: use nop cpu directives only if supported nasm does not support 'CPU foonop' directives. This adds a configure test for the directive and uses it only if supported. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Mans Rullgard	7238265052	x86: fix rNmp macros with nasm For some reason, nasm requires this. No harm done to yasm. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	13 years ago
Loren Merritt	f8d8fe255d	x86inc: clip num_args to 7 on x86-32. This allows us to unconditionally set the cglobal num_args parameter to a bigger value, thus making writing yasm code even easier than before. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Ronald S. Bultje	96c9cc1094	x86inc: sync to latest version from x264.	13 years ago
Justin Ruggles	79687079a9	x86: add support for fmaddps fma4 instruction with abstraction to avx/sse	13 years ago
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	13 years ago
Clément Bœsch	7073174551	x86inc: put basicnop under ifdef to prevent compile failure. This should fix the NASM box. Reviewed-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Michael Niedermayer	dc12f7d4ec	x86inc: try to put amdnop under ifdef to prevent compile failure based on similar amdnop usage in ffmpeg Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Loren Merritt	2cd1f5cadc	x86inc: modify ALIGN to not generate long nops on i586 Signed-off-by: Diego Biurrun <diego@biurrun.de>	13 years ago
Reimar Döffinger	9b1f776d75	Fix compilation with NASM. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Nico Weber	a4a88fd42c	Remove .rodata alignment kludge for Mach-O if a recent enough yasm is used. Yasm was fixed in its r2161 and yasm 0.8.0 (Apr 2010) contained this fix. Nasm was fixed in 2.06 (Jun 2009): https://groups.google.com/group/alt.lang.asm/browse_thread/thread/fcc85bbc3745d893 I tested with yasm 0.7.99 and yasm 1.2.0.7, where this works fine. I also tested with nasm. The nasm shipping with Xcode is too old to understand ffmpeg's assembly, before and after the patch. Nasm 2.10 fails to compile fft_mmx.asm on trunk with libavcodec/x86/fft_mmx.asm:88: panic: section ".text" has already been specified with alignment 32, conflicts with new alignment of 16 but builds fine if I change the two alignment "16"s in x86inc.asm to "32". With this patch, nasm 2.10 fails with libavcodec/x86/fft_mmx.asm:39: panic: section ".rodata" has already been specified with alignment 32, conflicts with new alignment of 16 instead, but again builds fine with s/16/32/. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Loren Merritt	705f3d4759	x86inc: support AVX abstraction for 2-operand instructions Add cvtdq2ps and cvtps2dq to the AVX instruction list. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	13 years ago
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	13 years ago

1 2

100 Commits (f7459bcfc5b54554f95616214696b2a9d189d7fa)