Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
...
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2 years ago
Andreas Rheinhardt
2b94f23b06
swresample/x86/audio_convert: Remove obsolete MMX functions
...
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
3 years ago
James Almer
acdd672506
x86/audio_convert: fix clobbering of xmm registers
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
f37a5dcb55
swresample/x86: add missing colon to labels
...
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
f7ed997a6d
x86/swr: make pack_8ch functions work with compilers without aligned stack
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
59ac93f6af
x86/swr: add SSE/AVX unpack_6ch functions
...
int32/float only
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
6abf00d615
x86/swr: load constants outside the loop in pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
975ff6a3c6
x86/swr: disable pack_8ch functions on msvc/icl x86_32
...
Until a proper fix is committed.
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
5f14f9e984
x86/swr: add missing alignment check to pack_6ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
37b35feb64
x86/swr: add SSE2/AVX pack_8ch functions
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
edff061fb0
x86/swr: add ff_float_to_int32_a_avx2
...
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
b385c4c6a3
x86/swr: replace sse4 instructions in pack_6ch with sse ones
...
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
ad75d2b590
x86: Fix compilation with nasm on PPC & OS/2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
ca2818b881
swresample/x86/audio_convert: add emms to CONV
...
Might fix Ticket1874
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Carl Eugen Hoyos
52be5428c0
Add some missing _EXTERNAL suffixes to yasm source files.
13 years ago
Michael Niedermayer
c88e60af76
swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
a927641e7a
libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
ca986a06ad
libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
c4047ad9e0
libswresample: make NOP_N macro less picky on its parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
57bc91c710
libswresample: Change FLOAT_TO_INT32_N to need 1 register less
...
same speed on sandy bridge
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
ecfdd125f1
libswresample-simd: rename 6ch pack to what it is
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
429b964e25
libswresample-simd: make the converter registers parameters
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b3915c4b70
libswresample: cosmetics
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
24c0d1583c
libswresample: unaligned AVX/SSE4 float and int32 6ch pack
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Justin Ruggles
6f67d9833b
libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
cbbc472467
swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3
...
more than 10% faster (tested on sandybridge)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
72ae583b7d
swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions)
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
adfa53b91f
swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
5f4e18cd16
swr: replace the remaining 2 audio convert SIMD macros by the new ones
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
df5ff103cd
swr: fix internal asm labels
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b6f4f0d9ef
swr: fix PACK_2CH register count
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
aae3119643
swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros
...
this simplifies the code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
47055b8913
swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
e8dd7928c8
swr: change simd len argument to be in samples instead of dst bytes.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
c1fe2db376
swr: add ff_int32_to_float_a_avx
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
65722e7fc5
swr: int32_to_int16_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
73edb58c3c
swr: float_to_int16_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
5932938c9a
swr: float_to_int32_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
b72a0f9c23
swr: add int16_to_float_sse2()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
832c3b10d2
swr: add int32_to_float_sse2
...
could be done for sse/3dnow too if someone wants
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
95057b1972
swr: int16->int32: use the old index negate trick to avoid 2 adds
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
113738d6c2
swr: more correct cglobal parameters to int16->int32
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
fa5daaca0d
swr: seperate functions for aligned & unaligned
...
If someone has an idea on how to do this cleaner, its welcome
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Michael Niedermayer
bcc66ff0e4
swr: add int16_to_int32_mmx/sse
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago