This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.
Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
Modify the asm accordingly.
GLOBAL is now no longoer necessary for PIC-compliant loads.
Originally committed as revision 23739 to svn://svn.ffmpeg.org/ffmpeg/trunk
Use the new x86inc features to support 64-bit Windows on all non-x264 nasm
assembly code as well.
Patch by John Adcock, dscaler.johnad AT googlemail DOT com.
Win64 changes originally by Anton Mitrofanov.
x86util changes mostly by Holger Lubitz.
Originally committed as revision 19580 to svn://svn.ffmpeg.org/ffmpeg/trunk
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.
Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
This includes SSE2 luma deblocking code and both MMXEXT and SSE2 luma
intra deblocking code for H.264 decoding. This assembly is available
under --enable-gpl and speeds decoding of Cathedral by 7%.
Originally committed as revision 16239 to svn://svn.ffmpeg.org/ffmpeg/trunk