x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx
Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc708 used it, but
the version that was eventually applied does not.
[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.
Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Revert "swscale: update context offsets after removal of AlpMmxFilter."
(commit a95e3fa90b)
and
Revert "swscale: Remove some write-only variables related to alpha handling."
(commit 9d03cb9fc5).
They broke alpha handling - it's the evil inline asm that still uses that
variable, so it's not truely write-only.
Replaces a very hackish hack to fix the same issue (call instruction
overwriting stack variables).
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
gcc 4.6 no longer decrements esp to account for local variables.
Thus using call will end up overwriting some local variable.
So add an extra one it can safely clobber.
This is a huge hack because it's basically pure chance it works,
no idea how this is supposed to be done.
Fixes trac ticket #397.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Speed: from 3.9x to 9.6x speed improvement over C, and some small
(up to 15%) speed improvements over existing MMX code (particularly
for bigger filters).
Author of the fix is ronald, the enabling & commit message are mine.
This fixes
commit 4e3e333a79
Author: Ronald S. Bultje <rsbultje@gmail.com>
Date: Tue Jul 5 12:49:11 2011 -0700
swscale: error dithering for 16/9/10-bit to 8-bit.
Based on a somewhat similar idea in FFmpeg's swscale copy.
The Fix was originally commited in: (and i missed it due to the commit message)
commit 5c391a161a
Author: Ronald S. Bultje <rsbultje@gmail.com>
Date: Fri Jul 8 14:39:04 2011 -0700
swscale: rename uv_off/uv_off2 to uv_off_px/byte.