Paul B Mahol
b69c91bbee
avcodec/x86: add cfhdenc SIMD
4 years ago
Paul B Mahol
389cc142fb
avcodec/cfhd: add x86 SIMD
...
Overall speed changes for 1920x1080, yuv422p10le, 60fps from: 0.19x to 0.343x
5 years ago
James Almer
58d167bcd5
avcodec/Makefile: add missing pngdsp dependency to the lscr decoder
...
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
Lynne
605e330310
x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
...
58893 decicycles in deemphasis_c, 130548 runs, 524 skips
9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup
24866 decicycles in postfilter_c, 65386 runs, 150 skips
5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup
Total decoder speedup: ~14%
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
6 years ago
Lynne
5468c1d075
celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled
...
The entire function was defined away before.
6 years ago
Lynne
4a2c651620
x86/opus_dsp: rename to celt_pvq
...
Its only used in the encoder and in CELT's PVQ.
6 years ago
Aurelien Jacobs
f1e490b1ad
sbcenc: add MMX optimizations
...
This was originally based on libsbc, and was fully integrated into ffmpeg.
Rough speed test:
C version: speed= 592x
MMX version: speed= 785x
7 years ago
Martin Vignali
9b8c1224d7
libavcodec/exr : add X86 SIMD for reorder_pixels
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Ivan Kalvachev
7205513f8f
SIMD opus pvq_search implementation
...
Explanation on the workings and methods used by the
Pyramid Vector Quantization Search function
could be found in the following Work-In-Progress mail threads:
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
8 years ago
Paul B Mahol
4ed7c2bbc3
avcodec/utvideodec: add SIMD for restore_rgb_planes
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
8 years ago
Rostislav Pehlivanov
e1120b1c54
mdct15: add assembly optimizations for the 15-point FFT
...
c: 1802 decicycles in fft15,16774635 runs, 2581 skips
avx: 865 decicycles in fft15,16776378 runs, 838 skips
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
8 years ago
Diego Biurrun
fd502f4f5f
build: Generalize yasm/nasm-related variable names
...
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4
)
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Darnley
8e89f6fd37
avcodec/x86: move simple_idct to external assembly
8 years ago
Ronald S. Bultje
c9d98c5649
cavs: convert idct from inline asm to yasm.
8 years ago
Clément Bœsch
40ac226014
lavc/x86/hevc: rename hevc_res_add to hevc_add_res
...
This will simplify incoming merge.
8 years ago
Diego Biurrun
39e208f4d4
build: Generalize yasm/nasm-related variable names
...
None of them are specific to the YASM assembler.
8 years ago
James Almer
cf9ef83960
huffyuvencdsp: move shared functions to a new lossless_videoencdsp context
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Pierre Edouard Lepere
6d5636ad9a
hevc: x86: Add add_residual() SIMD optimizations
...
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
8 years ago
Rostislav Pehlivanov
d2ae5f77c6
aacenc: add SIMD optimizations for abs_pow34 and quantization
...
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
8 years ago
Clément Bœsch
a692724c58
vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Justin Ruggles
b57e38f52c
ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
...
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.
Simplification and porting to current x86inc infrastructure by Diego Biurrun.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
9 years ago
Anton Khirnov
12004a9a7f
audiodsp/x86: yasmify vector_clipf_sse
9 years ago
Anton Khirnov
89466de4ae
vp9/x86: rename vp9dsp to vp9mc
...
It only contains the MC SIMD, other SIMD will go into different files.
9 years ago
James Almer
efc9d5c4bc
x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
1dfc3cf89d
x86: hpeldsp: Split off VP3-specific bits into a separate file
9 years ago
James Almer
fca3c3b619
hevc: Add AVX2 DC IDCT
...
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
9 years ago
Diego Biurrun
01621202aa
build: miscellaneous cosmetics
...
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
9 years ago
Diego Biurrun
1a094af638
fft: Split MDCT bits off from FFT
9 years ago
Timothy Gu
e3461197b1
x86/vc1dsp: Split the file into MC and loopfilter
9 years ago
Diego Biurrun
15a24614ae
build: Add vc1dsp component for more fine-grained dependencies
9 years ago
James Almer
8ae7447941
x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
...
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Timothy Gu
9fd6ea933f
dirac_dwt: Make x86 files/functions names consistent
9 years ago
Timothy Gu
17ab8f7e68
diracdsp: Make x86 files/functions names consistent
9 years ago
foo86
ae5b2c5250
avcodec/dca: add new decoder based on libdcadec
9 years ago
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
9 years ago
James Almer
209f50e16b
avcodec/synth_filter: split off remaining code from dcadec files
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
03ef89faf2
x86: build: Group all encoder objects together
9 years ago
Anton Khirnov
e7078e842d
hevcdsp: add x86 SIMD for MC
9 years ago
James Almer
73353af6e5
x86/Makefile: move decoder/encoder objects out of the subsystems section
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Timothy Gu
6b41b44149
huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm
...
Heavily based upon ff_add_bytes by Christophe Gisquet.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
9 years ago
Ronald S. Bultje
1c3be32533
vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.
10 years ago
Christophe Gisquet
4369b9dc7b
x86: simple_idct(_put): 10bits versions
...
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.
For 16 frames of a DNxHD 4:2:2 10bits test sequence:
C: 60861 decicycles in idct, 1048205 runs, 371 skips
sse2: 27567 decicycles in idct, 1048216 runs, 360 skips
avx: 26272 decicycles in idct, 1048171 runs, 405 skips
The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Paul B Mahol
35af7add6f
avcodec/takdec: add x86 SIMD for rest of decorrelation modes
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
10 years ago
James Almer
72254b19b8
x86/alacdsp: add simd optimized functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
26ece7a511
vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.
10 years ago
Ronald S. Bultje
db7786e8ff
vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.
10 years ago
James Almer
3178931a14
x86/hevc_sao: move 10/12bit functions into a separate file
...
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
344d519040
vp9: add subpel MC SIMD for 10/12bpp.
10 years ago
Ronald S. Bultje
6354ff0383
vp9: add fullpel (put) MC SIMD for 10/12bpp.
10 years ago
Vittorio Giovara
cad40a3833
lavc: Drop deprecated deinterlace module
...
Deprecated in 03/2013.
10 years ago