This allows calling the function without the need to check if the
dictionary contains any entries
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
av_dynarray_add_nofree function have similar functionality
as existing av_dynarray_add, but it doesn't deallocate memory
on fails.
Signed-off-by: Lukasz Marek <lukasz.m.luki@gmail.com>
AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Nicolas George <george@nsup.org>
Reviewed-by: Lukasz Marek <lukasz.m.luki@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Ported from arm NEON and added vector_dmul_scalar.
Functions between 1.5 and 5 times faster than the C implementations
using Apple's clang-503.0.19 on A7.
Not guaranteed to be in nanosecond resolution. On iOS 7 the duration
of one tick is 125/3 ns which is still more than an order of magnitude
better then microseconds.
Replace decicycles with the neutral UNITS. Decicycles is strange but
tenths of a nanosecond and unspecific "deci"-ticks for mach_absolute_time
is just silly.
new function allows to unref buffer and obtain its data.
Signed-off-by: Lukasz Marek <lukasz.m.luki@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>