Signed-off-by: Nicolas George <george@nsup.org>
Reviewed-by: Lukasz Marek <lukasz.m.luki@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Ported from arm NEON and added vector_dmul_scalar.
Functions between 1.5 and 5 times faster than the C implementations
using Apple's clang-503.0.19 on A7.
Not guaranteed to be in nanosecond resolution. On iOS 7 the duration
of one tick is 125/3 ns which is still more than an order of magnitude
better then microseconds.
Replace decicycles with the neutral UNITS. Decicycles is strange but
tenths of a nanosecond and unspecific "deci"-ticks for mach_absolute_time
is just silly.
new function allows to unref buffer and obtain its data.
Signed-off-by: Lukasz Marek <lukasz.m.luki@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>