James Almer
acdd672506
x86/audio_convert: fix clobbering of xmm registers
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
f37a5dcb55
swresample/x86: add missing colon to labels
...
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
f7ed997a6d
x86/swr: make pack_8ch functions work with compilers without aligned stack
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
59ac93f6af
x86/swr: add SSE/AVX unpack_6ch functions
...
int32/float only
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
6abf00d615
x86/swr: load constants outside the loop in pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
975ff6a3c6
x86/swr: disable pack_8ch functions on msvc/icl x86_32
...
Until a proper fix is committed.
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
5f14f9e984
x86/swr: add missing alignment check to pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
37b35feb64
x86/swr: add SSE2/AVX pack_8ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
edff061fb0
x86/swr: add ff_float_to_int32_a_avx2
...
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
b385c4c6a3
x86/swr: replace sse4 instructions in pack_6ch with sse ones
...
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
ad75d2b590
x86: Fix compilation with nasm on PPC & OS/2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
ca2818b881
swresample/x86/audio_convert: add emms to CONV
...
Might fix Ticket1874
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Carl Eugen Hoyos
52be5428c0
Add some missing _EXTERNAL suffixes to yasm source files.
12 years ago
Michael Niedermayer
c88e60af76
swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
a927641e7a
libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
ca986a06ad
libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
c4047ad9e0
libswresample: make NOP_N macro less picky on its parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
57bc91c710
libswresample: Change FLOAT_TO_INT32_N to need 1 register less
...
same speed on sandy bridge
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
ecfdd125f1
libswresample-simd: rename 6ch pack to what it is
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
429b964e25
libswresample-simd: make the converter registers parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b3915c4b70
libswresample: cosmetics
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
24c0d1583c
libswresample: unaligned AVX/SSE4 float and int32 6ch pack
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Justin Ruggles
6f67d9833b
libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
cbbc472467
swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3
...
more than 10% faster (tested on sandybridge)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
72ae583b7d
swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions)
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
adfa53b91f
swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
5f4e18cd16
swr: replace the remaining 2 audio convert SIMD macros by the new ones
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
df5ff103cd
swr: fix internal asm labels
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b6f4f0d9ef
swr: fix PACK_2CH register count
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
aae3119643
swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros
...
this simplifies the code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
47055b8913
swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
e8dd7928c8
swr: change simd len argument to be in samples instead of dst bytes.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
c1fe2db376
swr: add ff_int32_to_float_a_avx
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
65722e7fc5
swr: int32_to_int16_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
73edb58c3c
swr: float_to_int16_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
5932938c9a
swr: float_to_int32_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b72a0f9c23
swr: add int16_to_float_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
832c3b10d2
swr: add int32_to_float_sse2
...
could be done for sse/3dnow too if someone wants
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
95057b1972
swr: int16->int32: use the old index negate trick to avoid 2 adds
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
113738d6c2
swr: more correct cglobal parameters to int16->int32
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
fa5daaca0d
swr: seperate functions for aligned & unaligned
...
If someone has an idea on how to do this cleaner, its welcome
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
bcc66ff0e4
swr: add int16_to_int32_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago