libavutil/arm/asm.S sets '.arch' depending on HAVE_ARMV5TE so that
assembling armv5te code will always succeed even if the default -march
flag does not support it. HAVE_ARMV5TE_EXTERNAL tests assembling code
with the default arch.
Fixes the missing symbol ff_prefetch_arm with --cpu= not including
armv5te.
CC: libav-stable@libav.org
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).
Arm asm updated by Janne Grunau <janne-libav@jannau.net>.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When initialising an FFTContext for a plain FFT, mdct_bits is not set
and can contain a garbage value. Since nbits is always valid and for
MDCT operation is mdct_bits - 2 checking this instead avoids using an
uninitialised value while having the same effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The loops were reading ahead one line, which could end up outside the
buffer for reference blocks at the edge of the picture. Removing
this readahead has no measurable performance impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
All our ARM asm preserves alignment so setting this attribute
in a common location is simpler. This removes numerous warnings
when linking with armcc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1. This patch makes
sure the correct expression is used in both cases.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code. A patch[1] for
clang to support the old syntax as well has been ignored since
January.
This patch chooses the syntax appropriate for each compiler,
allowing both to build the code. Notably, this change allows
building for iphone with the latest Apple Xcode update.
[1] http://llvm.org/bugs/show_bug.cgi?id=11855
Signed-off-by: Mans Rullgard <mans@mansr.com>
The standard syntax requires two destination registers for
LDRD/STRD instructions. Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.
References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Change the size specifiers to match the actual element sizes
of the data. This makes no practical difference with strict
alignment checking disabled (the default) other than somewhat
documenting the code. With strict alignment checking on, it
avoids trapping the unaligned loads.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The vertically interpolating variants of these functions read
ahead one line to optimise the loop. On the last line processed,
this might be outside the buffer. Fix these invalid reads by
processing the last line outside the loop.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The assembler may fail to place literal pools close enough to
instructions referencing them. An explicit .ltorg directive
fixes this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.
The x86 code already used that trick.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.
The x86 and neon versions are modified accordingly.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.