This allows further unrolling the DSP implementation where possible.
x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.
Decoding time: ~8.80s -> 8.64s (ie around 2%)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The external assembly function uses mmxext instructions and should not be
masqueraded as an mmx-only function. Instead, use the mmx-only inline
assembly function.
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
The problem is that the ssse3 psign instruction does the wrong
thing here. Commit ea60dfe incorrectly removed a macro emulating
this instruction for pre-ssse3 code. However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.
Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it. Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.
None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe).
Signed-off-by: Mans Rullgard <mans@mansr.com>
They were moved into code under HAVE_YASM and most of them
even into completely disabled code with no reason given
for that in the commit message.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Should fix compilation with icc and should help prevent any future duplicates
Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.
Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk
Includes mmx2 asm for the various functions.
Note that the actual idct still does not have an x86 SIMD implemtation.
For wmv3 files using regular idct, the decoder just falls back to simple_idct,
since simple_idct_dc doesn't exist (yet).
Originally committed as revision 19204 to svn://svn.ffmpeg.org/ffmpeg/trunk
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.
Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
Neither the asm() nor the __asm__() keyword is part of the C99
standard, but while GCC accepts the former in C89 syntax, it is not
accepted in C99 unless GNU extensions are turned on (with -fasm). The
latter form is accepted in any syntax as an extension (without
requiring further command-line options).
Sun Studio C99 compiler also does not accept asm() while accepting
__asm__(), albeit reporting warnings that it's not valid C99 syntax.
Originally committed as revision 15627 to svn://svn.ffmpeg.org/ffmpeg/trunk
summary of changes:
- Use MANGLE when loading some constants into MMX registers.
- Convert those constants to non-static and thus add ff_ prefix.
- Remove last parameter of MSPEL_FILTER13_CORE (was constant).
- Use of "+r" instead of stricter but unnecessary "+g".
- Use of REG_c and direct loading of some of the above.
patch by Christophe GISQUET, christophe.gisquet free fr
Subject: [FFmpeg-devel] [PATCH] Roundup issue #301
Date: Fri, 28 Dec 2007 19:22:18 +0100
Originally committed as revision 11376 to svn://svn.ffmpeg.org/ffmpeg/trunk