James Almer
5750d6c5e9
x86: move XOP emulation code back to x86inc
...
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
f37a5dcb55
swresample/x86: add missing colon to labels
...
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
c16e99e3b3
x86: check for AV_CPU_FLAG_AVXSLOW where useful
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Michael Niedermayer
c0e3b46118
swresample: add av_cold to init functions
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
James Almer
f7ed997a6d
x86/swr: make pack_8ch functions work with compilers without aligned stack
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Michael Niedermayer
b74ecb82fa
swresample/x86/rematrix_init: Check av_malloc* return codes, forward errors
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Michael Niedermayer
48ffaaaaef
swresample/x86/rematrix_init: Use av_mallocz_array()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
James Almer
59ac93f6af
x86/swr: add SSE/AVX unpack_6ch functions
...
int32/float only
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
6abf00d615
x86/swr: load constants outside the loop in pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
975ff6a3c6
x86/swr: disable pack_8ch functions on msvc/icl x86_32
...
Until a proper fix is committed.
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
5f14f9e984
x86/swr: add missing alignment check to pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
37b35feb64
x86/swr: add SSE2/AVX pack_8ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
edff061fb0
x86/swr: add ff_float_to_int32_a_avx2
...
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
b385c4c6a3
x86/swr: replace sse4 instructions in pack_6ch with sse ones
...
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
9937362c54
x86/swr: use lavu helper macros to check CPU extensions
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
8279a15284
x86/swr: split audioconvert and rematrix DSP into separate files
...
Also rename resample_x86_dsp.c to resample_init.c
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
857cd1f33b
swr: initialize only the necessary resample dsp functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
b5f0eac068
swr: rename swresample_dsp init functions to swri_resample_dsp
...
The swresample_ prefix is not for internal functions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
c45b7f0d80
x86/swr: add ff_resample_{common, linear}_int16_xop
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
1a69224f44
x86/swr: add ff_resample_{common, linear}_float_fma
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
dd2c9034b1
x86/swr: convert resample_{common, linear}_double_sse2 to yasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
312531 -> 311528 dezicycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
847bb638c0
swr: convert resample_common/linear_int16_mmx2/sse2 to yasm.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
faa1471ffc
swr: rewrite resample_common/linear_float_sse/avx in yasm.
...
Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm)
cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/
sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to
32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to
38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm
5.1 and gcc 4.8/9).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
083cd3d1f7
swr: compile mmx2 s16p functions only on x86-32.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
7f4dfbd080
swr: add prototypes for resample dsp functions
...
Should fix compilation failures with MSVC and any other compiler
without inline asm support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
ada8f9c046
swr: remove obsolete function prototypes.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
7128a35f8c
swr: split out DSP functions.
...
DSP bits of swri_resample go into their own mini-DSP functions; DSP
init goes from a per-call branch in multiple_resample to a proper
DSP init routine; x86 bits go into x86/; swri_resample() moves out of
resample_template.c into resample.c because it's independent of DSP
code or sample type; multiple_resample() is simplified.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
a9bf713d35
swresample: add swri_resample_float_avx
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
1898c2f49d
inline asm: fix arrays as named constraints.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
4cdea92976
swresample/resample: add missing xmm clobbers
...
Might fix fate-swr on ICL
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
cdac3ab59f
swresample: add swri_resample_double_sse2
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
63dbba655e
swresample/resample: sse float linear interpolation
...
About two times faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
fa25c4c400
swresample/resample: mmx2/sse2 int16 linear interpolation
...
About three times faster
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
32291ba6ea
swresample: add swri_resample_float_sse
...
At least two times faster than the C version.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
8236747511
Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
...
This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
7c8bf09edd
swresample: change COMMON_CORE_INT16 asm from SSSE3 to SSE2
...
pshuf+paddd is slightly faster than phaddd.
The real gain is in pre-ssse3 processors like AMD K8 and K10, which get
a big boost in performance compared to the mmxext version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Martin Storsjö
3dd04cbcf7
swresample: Add arm&x86 clobber tests
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Reimar Döffinger
cbeaf67888
Avoid using empty macro arguments.
...
These are not supported by all compilers (gcc 2.95 but also older SPARC
compilers, see gcc bug #33304 for example), and there is no real need for them.
One use of this feature remains in libavdevice/v4l2.c which can't be
replaced quite as easily.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
11 years ago
Ronald S. Bultje
ad75d2b590
x86: Fix compilation with nasm on PPC & OS/2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
ca2818b881
swresample/x86/audio_convert: add emms to CONV
...
Might fix Ticket1874
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Michael Niedermayer
4cfc92081d
swr: add native_simd_one
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Michael Niedermayer
31a797eb28
swr: add av_cold to init/free functions
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Carl Eugen Hoyos
a26789cf9f
Fix compilation with yasm-0.6.2.
13 years ago
Carl Eugen Hoyos
52be5428c0
Add some missing _EXTERNAL suffixes to yasm source files.
13 years ago
Michael Niedermayer
68712ce820
swr/x86: 16bit integer mix functions need SSE2 not SSE
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
c88e60af76
swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Clément Bœsch
ca612a27ae
swr: fix make checkheaders.
13 years ago
Clément Bœsch
022cbb6791
swr: small align cosmetic.
13 years ago
Clément Bœsch
3491c2a909
swr: use __asm__ instead of __asm.
...
For consistency only.
13 years ago
Michael Niedermayer
4ccf6e3971
swr: MMX2 & SSSE3 int16 resample core
...
about 4 times faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago