James Almer
1b932eb150
x86: add detection for FMA3 instruction set
...
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
11 years ago
James Almer
10b0161d78
x86: add missing XOP checks and macros
...
Signed-off-by: James Almer <jamrial@gmail.com>
11 years ago
James Almer
0bc3de19ff
x86: add detection for Bit Manipulation Instruction sets
...
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
a2af8eddab
x86: add detection for FMA3 instruction set
...
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
996697e266
x86: float dsp: unroll SSE versions
...
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
11 years ago
Christophe Gisquet
133b34207c
x86: float dsp: unroll SSE versions
...
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
23a8c63452
x86inc: Extend FMA_INSTR functionality
...
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
6c12b1de06
x86: add missing XOP checks and macros
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Loren Merritt
b7d0d10a1d
x86inc: Speed up assembling with Yasm
...
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
11 years ago
Loren Merritt
4d55fe7204
x86inc: speed up compilation with yasm
...
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
11 years ago
Michael Niedermayer
c3814ab654
rename new lls code to lls2 to avoid conflict with the old which has a different ABI
...
also remove failed attempt at a compatibility layer, the code simply cannot work
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
bbe66ef912
avutil: rename lls to lls2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Kieran Kunhya
865b70bc5d
Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Kieran Kunhya
4d6ee07255
libavutil: x86: Add AVX2 capable CPU detection.
...
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Jason Garrett-Glaser
a3fabc6cb3
x86: more AVX2 framework
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Jason Garrett-Glaser
c6908d6b4b
x86inc: FMA3/4 Support
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Derek Buitenhuis
206895708e
x86inc: Remove our FMA4 support
...
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9
.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
c108ba0175
x86inc: Use VEX-encoded instructions in AVX functions
...
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
ad7d7d4f6a
x86inc: Remove .rodata kludges
...
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
3e2fa991db
x86inc: remove misaligned cpu flag
...
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Jason Garrett-Glaser
7115566541
x86inc: various minor backports from x264
...
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Derek Buitenhuis
47f9d7ce54
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
...
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
bbe4a6db44
x86inc: Utilize the shadow space on 64-bit Windows
...
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Loren Merritt
3fb78e99a0
x86inc: create xm# and ym#, analagous to m#
...
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Loren Merritt
49ebe3f9fe
x86inc: fix some corner cases of SWAP
...
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
63f0d62310
x86inc: Use SSE instead of SSE2 for copying data
...
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Henrik Gramner
ad76e6e7e1
x86inc: Set ELF hidden visibility for global constants
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Loren Merritt
25cb0c1a1e
x86inc: activate REP_RET automatically
...
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
11 years ago
Ronald S. Bultje
c07ac8d467
VP9 MC (ssse3) optimizations.
...
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
11 years ago
Alex Smith
08fa828b3f
avutil: Fix compilation with inline asm disabled on mingw
...
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <martin@martin.st>
11 years ago
Thilo Borgmann
d814a839ac
Reinstate proper FFmpeg license for all files.
11 years ago
Diego Biurrun
79aec43ce8
x86: Add and use more convenience macros to check CPU extension availability
11 years ago
Diego Biurrun
8410d6e93c
avutil: Refactor CPU extension availability macros
11 years ago
Diego Biurrun
b78b10c4b7
avutil: Move internal CPU detection function declarations to private header
11 years ago
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
12 years ago
Michael Niedermayer
a478e99a60
avutil/x86: reenable ff_update_lls_avx()
...
The bug has been fixed in c8b920a9b7
by Loren Merritt
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Loren Merritt
c8b920a9b7
lls/x86: use 3-operator vaddpd in ADDPD_MEM
...
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <anton@khirnov.net>
12 years ago
Michael Niedermayer
a6e46ed51a
Revert "avutil/x86: disable ff_evaluate_lls_sse2() for 32bit"
...
This reverts commit 247425241c
.
12 years ago
Loren Merritt
1221bb6239
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
12 years ago
Michael Niedermayer
247425241c
avutil/x86: disable ff_evaluate_lls_sse2() for 32bit
...
It just segfaults on 32bit, thus its disabled until someone fixes it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Michael Niedermayer
a285079bc7
lls.asm: disable ff_update_lls_avx
...
The code doesnt build with yasm from ubuntu 12.04
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Michael Niedermayer
0b40c50508
lls.asm: put avx code under if HAVE_AVX_EXTERNAL
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Loren Merritt
b545179fdf
x86: lpc: simd av_evaluate_lls
...
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Loren Merritt
502ab21af0
x86: lpc: simd av_update_lls
...
4x-6x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Diego Biurrun
1fda184a85
avutil: Add av_cold attributes to init functions missing them
12 years ago
Christophe Gisquet
566b7a20fd
x86: float dsp: butterflies_float SSE
...
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
12 years ago
Michael Niedermayer
92218aad00
butterflies_float: replace 2 lea by 2 add
...
adds are simpler instructions and should be faster or equally fast
on all cpus
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Christophe Gisquet
1a4007964c
x86: float dsp: butterflies_float SSE
...
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Ronald S. Bultje
b93b27edb0
dsputil: Make dsputil selectable
...
Signed-off-by: Martin Storsjö <martin@martin.st>
12 years ago
Christophe Gisquet
2e81acc687
x86inc: Fix number of operands for cmp* instructions
...
cmp{p,s}{s,d} instructions do take an imm8 operand.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
12 years ago