Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
...
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2 years ago
Andreas Rheinhardt
fed07efcde
avcodec/x86/lossless_videodsp: Remove obsolete MMX(EXT) functions
...
The only systems which benefit from these are truely
ancient 32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
3 years ago
James Almer
438f884fc4
x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3
...
SSSE3_FAST is the proper check for it.
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
James Almer
a4fc63c0f9
x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2
...
Fixes valgrind
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
630967ef63
avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred
7 years ago
Martin Vignali
4353c35067
avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred
7 years ago
Martin Vignali
cfbcea1cca
avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version
7 years ago
Martin Vignali
0380b72d35
libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier
7 years ago
Martin Vignali
da62128ea1
libavcodec/lossless_videodsp : add add_bytes avx2 version
7 years ago
James Almer
6d4c9f2ade
lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
47f212329e
huffyuvdsp: move functions only used by huffyuv from lossless_videodsp
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
30c1f27299
huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
5ac1dd8e23
lossless_videodsp: move shared functions from huffyuvdsp
...
Several codecs other than huffyuv use them.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Henrik Gramner
f0b7882ceb
x86inc: Drop SECTION_TEXT macro
...
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
10 years ago
Michael Niedermayer
042a82ca37
avcodec/x86/lossless_videodsp: Fix size of values read for left/left_top
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
508e7a5c16
x86: huffyuv: fix {add,diff}_int16
...
They used an extra, undeclared register. Fixes a crash in
fate-vsynth3-ffvhuff444p16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
7b4c46050e
rename add_hfyu_left_prediction_int16 to add_hfyu_left_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
550ae6c02f
rename add_hfyu_median_prediction_int16 to add_hfyu_median_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
40a4ab8ba4
rename sub_hfyu_median_prediction_int16 to sub_hfyu_median_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
d601106ab1
avcodec/x86/lossless_videodsp: fix w type
...
Fixes fate issues on mingw64
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Clément Bœsch
5f4d04d084
x86/lossless_videodsp: silly one-line cosmetic.
11 years ago
Clément Bœsch
5267e85056
x86/lossless_videodsp: use common macro for add and diff int16 loop.
11 years ago
Clément Bœsch
cddbfd2a95
x86/lossless_videodsp: simplify and explicit aligned/unaligned flags
11 years ago
Michael Niedermayer
ef00ef7553
avcodec/x86/lossless_videodsp: port sub_hfyu_median_prediction_int16 to yasm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
fee97f25fa
avcodec/x86/lossless_videodsp: port add_hfyu_median_prediction_mmxext to 16bit
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
631939bde6
avcodec/x86/lossless_videodsp: add diff_int16_mmx/sse2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Reimar Döffinger
76421982d0
lossless_videodsp.asm: fix compilation.
...
Fixes these errors with nasm:
libavcodec/x86/lossless_videodsp.asm:86: error: invalid combination of opcode and operands
libavcodec/x86/lossless_videodsp.asm:88: error: invalid combination of opcode and operands
I don't know whether movd or movq was meant, but either way
maskq vs. maskd must match the mov size.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
11 years ago
Michael Niedermayer
83b67ca056
avcodec/x86/lossless_videodsp: Port lorens add_hfyu_left_prediction_ssse3/sse4 to 16bit
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
63d2be7533
avcodec/x86/lossless_videodsp: use SPLATW in add_int16
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
f70d7eb20c
Move add/diff_int16 to lossless_videodsp
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago