GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This adds macros for accessing the EFLAGS register and uses
these instead of coding the entire check in inline asm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds macros for accessing the EFLAGS register and uses
these instead of coding the entire check in inline asm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds whitespace around operators, aligns line continuation
backslashes, and breaks long lines. Also fixes an ifdef halfway
through a statement. The one line of duplication this saved is
not worth the ugliness.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The attribution was removed by libav while moving the code to libavutil
The original code is from
commit eb4825b5d4
Author: Loren Merritt <lorenm@u.washington.edu>
Date: Thu Aug 10 19:06:25 2006 +0000
sse and 3dnow implementations of float->int conversion and mdct windowing.
15% faster vorbis.
and
commit 069720565c
Author: Loren Merritt <lorenm@u.washington.edu>
Date: Fri Aug 11 18:19:37 2006 +0000
vorbis simd tweaks
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Yasm was fixed in its r2161 and yasm 0.8.0 (Apr 2010) contained this fix.
Nasm was fixed in 2.06 (Jun 2009):
https://groups.google.com/group/alt.lang.asm/browse_thread/thread/fcc85bbc3745d893
I tested with yasm 0.7.99 and yasm 1.2.0.7, where this works fine.
I also tested with nasm. The nasm shipping with Xcode is too old to understand
ffmpeg's assembly, before and after the patch. Nasm 2.10 fails to compile
fft_mmx.asm on trunk with
libavcodec/x86/fft_mmx.asm:88: panic: section ".text" has already been specified with alignment 32, conflicts with new alignment of 16
but builds fine if I change the two alignment "16"s in x86inc.asm to "32". With this patch,
nasm 2.10 fails with
libavcodec/x86/fft_mmx.asm:39: panic: section ".rodata" has already been specified with alignment 32, conflicts with new alignment of 16
instead, but again builds fine with s/16/32/.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both
(through -m amd64) produce 64-bit binary code.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Functions using INIT_MMX may still access XMM registers through direct
means (xmm0-15). Therefore, they still need to be marked for clobber
so they can be properly saved/restored.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>