James Almer
1faedb9a11
x85/opusdsp: enable the functions on all FMA3 CPUs
...
It's not using ymm registers, so limiting it to CPUs with fast AVX
is not necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
5 years ago
Lynne
605e330310
x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
...
58893 decicycles in deemphasis_c, 130548 runs, 524 skips
9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup
24866 decicycles in postfilter_c, 65386 runs, 150 skips
5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup
Total decoder speedup: ~14%
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
6 years ago
Paul B Mahol
dcae5ba322
avfilter: add anlmdn filter x86 SIMD optimizations
6 years ago
James Almer
dee579ffcd
x86/fixed_dsp: add ff_butterflies_fixed_sse2
...
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Janne Grunau
d3789eeeed
aarch64: implement videodsp.prefetch
...
8% faster h264 decoding on Apple A7.
11 years ago
Diego Biurrun
c9f933b5b6
Add av_cold attributes to arch-specific init functions
12 years ago
Michael Niedermayer
e16bac7b33
videodsp: Fix project name
...
These are all part of splited out dsp utils from FFmpeg
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Ronald S. Bultje
8c53d39e7f
lavc: introduce VideoDSPContext
...
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Mans Rullgard
b692d246ea
vp8: arm: separate ARMv6 functions from NEON
...
This is a preparation for complete ARMv6 optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
d526c5338d
ARM: allow runtime masking of CPU features
...
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Michael Niedermayer
c266eb1928
arm: Fix 10l typo
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Ronald S. Bultje
bd66f073fe
vp8: change int stride to ptrdiff_t stride.
...
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
13 years ago
Diego Biurrun
32f3c541bc
doxygen: Do not include license boilerplates in Doxygen comment blocks.
13 years ago
Ronald S. Bultje
a5dfeb612e
VP8: armv6 optimizations.
...
From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel
trailer, i.e. a ~2x speedup overall, on a Nexus S.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
ef15d71c1f
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003
)
14 years ago
Mans Rullgard
a1c1d3c003
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago