SWS_CPU_CAPS are deprecated and slated to removed with libswscale major
version 3. No need to provide a SWS_CPU_CAPS_MMX2 as backward
compatibility define under the same explicit condition.
Otherwise during scaling it will try to interpret input in the wrong way and
that leads to the test results disagreeing on different platforms and with
different optimizations.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Some systems, e.g. Minix, have sys/mman.h defining MAP_ANONYMOUS without
providing (working) mmap and friends. The mmx filter generation code
checks only for MAP_ANONYMOUS, not for availability of mmap itself which
leads to build errors on aforementioned systems.
This changes the conditional compilation to use mmap only if all the
required functions are available.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This gets rid of the variable-length scratch buffer by filtering 16
pixels at a time and writing directly to the destination. The extra
loads this requires to load the source values are compensated by not
doing a round-trip to memory before shifting.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This reverts parts of e0c6cce447. There is external mmx asm that
requires this alignment.
This fixes crashes when using swscale in builds with external mmx,
without inline assembly.
Signed-off-by: Martin Storsjö <martin@martin.st>
This introduces support for width%4==2 in addition to width%4==0. For
odd widths, some more checks are needed, since the current code always
handles two luma items in a row, thus there is a possibility of an
overread by one.
To access data at multiple fixed offsets from a base address, this
code uses a single "m" operand and code of the form "32%0", relying on
the memory operand instantiation having no displacement, giving a final
result of the form "32(%rax)". If the compiler uses a register and
displacement, e.g. "64(%rax)", the end result becomes "3264(%rax)",
which obviously does not work.
Replacing the "m" operands with "r" operands allows safe addition of a
displacement. In theory, multiple memory operands could use a shared
base register with different index registers, "(%rax,%rbx)", potentially
making more efficient use of registers. In the cases at hand, no such
sharing is possible since the addresses involved are entirely unrelated.
After this change, the code somewhat rudely accesses memory without
using a corresponding memory operand, which in some cases can lead to
unwanted "optimisations" of surrounding code. However, the original
code also accesses memory not covered by a memory operand, so this is
not adding any defect not already present. It is also hightly unlikely
that any such optimisations could be performed here since the memory
locations in questions are not accessed elsewhere in the same functions.
This fixes crashes with suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.