FFmpeg

Commit Graph

Author	SHA1	Message	Date
Ivan Kalvachev	7205513f8f	SIMD opus pvq_search implementation Explanation on the workings and methods used by the Pyramid Vector Quantization Search function could be found in the following Work-In-Progress mail threads: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>	8 years ago
Rostislav Pehlivanov	70eb77b34e	mdct15: add inverse transform postrotation SIMD 2.5ms frames: Before (c): 2638 decicycles in postrotate, 2097040 runs, 112 skips After (sse3): 1467 decicycles in postrotate, 2097083 runs, 69 skips After (avx2): 1244 decicycles in postrotate, 2097085 runs, 67 skips 5ms frames: Before (c): 4987 decicycles in postrotate, 1048371 runs, 205 skips After (sse3): 2644 decicycles in postrotate, 1048509 runs, 67 skips After (avx2): 2031 decicycles in postrotate, 1048523 runs, 53 skips 10ms frames: Before (c): 9153 decicycles in postrotate, 523575 runs, 713 skips After (sse3): 5110 decicycles in postrotate, 523726 runs, 562 skips After (avx2): 3738 decicycles in postrotate, 524223 runs, 65 skips 20ms frames: Before (c): 17857 decicycles in postrotate, 261866 runs, 278 skips After (sse3): 10041 decicycles in postrotate, 261746 runs, 398 skips After (avx2): 7050 decicycles in postrotate, 262116 runs, 28 skips Improves total decoding performance for real world content by 9% with avx2. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	8 years ago
Wan-Teh Chang	ea1ca17be2	avcodec/x86/cavsdsp: Delete #include "libavcodec/x86/idctdsp.h". This file already has #include "idctdsp.h", which is resolved to the idctdsp.h header in the directory where this file resides by compilers. Two other files in this directory, libavcodec/x86/idctdsp_init.c and libavcodec/x86/xvididct_init.c, also rely on #include "idctdsp.h" working this way. Signed-off-by: Wan-Teh Chang <wtc@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
James Almer	9d5e81d3b1	Revert "x86/sbrdsp: remove unnecessary sign extend instruction in apply_noise_main" This reverts commit `24bb7db403`. noise has to after all be sign extended, not zero extended, on tests other than checkasm. Fixes most aac tests broken by the now reverted commit.	8 years ago
James Almer	24bb7db403	x86/sbrdsp: remove unnecessary sign extend instruction in apply_noise_main noise needs to be zero extended and it can be done implicitly as a side effect in a subsequent instruction. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	bcbe9e4447	x86/sbrdsp: zero extend m_max in apply_noise_main Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	440285474b	x86/utvideodsp: make restore_rgb_planes functions work on x86_32 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	ac8ad8d098	x86/sbrdsp: sign extend start and end gprs in ff_sbr_hf_gen_sse Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Darnley	0c2acccd4b	avcodec/x86: use new x86-64 functions for -idct simple They now match according to FATE, barring any further bugs with untested parts	8 years ago
James Darnley	d7246ea9f2	avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions Includes add/put functions Rounding contributed by Ronald S. Bultje	8 years ago
James Darnley	8b19467d07	avcodec/x86: allow future 8-bit simple idct to have "DC only hack" Created by Ronald S. Bultje	8 years ago
Clément Bœsch	b12a36170b	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	8 years ago
Michael Niedermayer	516c213f08	avcodec/x86/vp9dsp_init_16bpp: Fix linking to missing ff_vp9_ipred_dr_32x32_16_avx2() on 32bit Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Ilia Valiakhmetov	35a5d9715d	avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation vp9_diag_downright_32x32_12bpp_c: 429.7 vp9_diag_downright_32x32_12bpp_sse2: 158.9 vp9_diag_downright_32x32_12bpp_ssse3: 144.6 vp9_diag_downright_32x32_12bpp_avx: 141.0 vp9_diag_downright_32x32_12bpp_avx2: 73.8 Almost 50% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
Paul B Mahol	4ed7c2bbc3	avcodec/utvideodec: add SIMD for restore_rgb_planes Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Matthieu Bouron	db5bf64b21	lavc/x86: clear r2 higher bits in ff_sbr_sum_square Suggested-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	349446e36f	x86/mdct15: use three operand form for some instructions Fixes compilation with old yasm	8 years ago
Rostislav Pehlivanov	e1120b1c54	mdct15: add assembly optimizations for the 15-point FFT c: 1802 decicycles in fft15,16774635 runs, 2581 skips avx: 865 decicycles in fft15,16776378 runs, 838 skips Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	8 years ago
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit `39e208f4d4`) Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Darnley	8221c71703	avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients	8 years ago
James Darnley	d2597fb0c1	avcodec/x86: modify simple_idct10 macros to add an action paramter	8 years ago
James Darnley	8781330d80	avcodec/x86: cleanup simple_idct10 Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.	8 years ago
James Darnley	e3db94302c	avcodec/x86/mpegenc: support transpose permuation type	8 years ago
James Darnley	fa30a0a548	avcodec/x86/mpegenc: check IDCT permutation type is a valid value	8 years ago
Michael Niedermayer	ae6f6d4e34	avcodec/x86/mpegvideo: Use intra scantable in dct_unquantize_h263_intra_mmx() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
James Almer	8bb59e6742	x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse About 2x faster than the c version.	8 years ago
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	8 years ago
James Almer	623d217ed1	avcodec/aacps: move checks for valid length outside the stereo_interpolate dsp function Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	b3446862bf	x86/vorbisdsp: optimize ff_vorbis_inverse_coupling_sse About 7% faster.	8 years ago
Ronald S. Bultje	d35ff98e27	vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2. Fixes trac issue 6459.	8 years ago
Ilia Valiakhmetov	81fc617c12	avcodec/vp9: ipred_dr_16x16_16 avx2 implementation Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
James Almer	497a4b554c	x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3 The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.	8 years ago
Ilia Valiakhmetov	73d9a9a6af	libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
James Almer	933dd62288	x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse ~2% faster.	8 years ago
James Almer	be3809a521	x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3 Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	b5a0971ff0	x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3() About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Darnley	0dea0114fb	avcodec/x86/idctdsp_init: reindent	8 years ago
James Darnley	8e89f6fd37	avcodec/x86: move simple_idct to external assembly	8 years ago
Clément Bœsch	584366a436	lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visible	8 years ago
James Darnley	7aa90b4e94	avcodec/h264: add sse2 versions of previous idct functions Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext	8 years ago
James Darnley	27460dfebc	avcodec/h264: add avx 8-bit h264_idct_dc_add Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext	8 years ago
James Darnley	f61d454ca1	avcodec/h264: add avx 8-bit h264_idct_add Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext	8 years ago
James Darnley	b5325c6711	avcodec/h264: use some 3 operand forms	8 years ago
James Darnley	060ba9e5e3	avcodec/h264: change RETs into REP_RETs where appropriate	8 years ago
Michael Niedermayer	fa8fd0808f	avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and clang compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions Build succeeds with this change, this was the only failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Ronald S. Bultje	83ae7e6350	x86/idctdsp_init: reindent.	8 years ago
Ronald S. Bultje	e0c205677f	x86/simple_idct: add explicit sse2 simple_idct_put/add versions. These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers.	8 years ago
Ronald S. Bultje	2f0591cfa3	cavs: add a sse2 idct implementation. This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time.	8 years ago
Ronald S. Bultje	c9d98c5649	cavs: convert idct from inline asm to yasm.	8 years ago
Ronald S. Bultje	b51d7d89f8	x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer. Since there's separate SSE2 implementations of xvid_idct_put/add, this patch has no practical impact on performance.	8 years ago

1 2 3 4 5 ...

2481 Commits (f4a71eec3252a17adebcb9caa5cfaf12af528ff6)