The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The qpel functions referenced here are not related to h264 and should
thus never have been under CONFIG_H264QPEL.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The only CPUs that have 3dnow and don't have mmxext are 12 years old.
Moreover, AMD has dropped 3dnow extensions from newer CPUs.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Instead of using an evil VLA, fall back to C version when edge
emulation is needed. MPEG4 GMC is a rarely used fringe feature
so the speed loss is an acceptable cost for safer code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
For some reason add_hfyu_median_prediction_cmov is only selected
on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions.
This patch allows it to be selected on any cpu with cmov with the
possibility of being overridden by the mmxext version.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.
From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
This will cause a segmentation fault.
This error was fixed in the second block of the assembly code, but not in
the unrolled loop.
How to reproduce:
This error is exposed when we build using Intel C++ Compiler, with
IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>