Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on
section redeclaration
The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].
The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].
Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This commit silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on section
redeclaration
The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mimicks what is done for the other instruction sets.
Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>