FFmpeg

Commit Graph

Author	SHA1	Message	Date
Christophe Gisquet	7aeafacfd0	x86/sbrdsp: Use different mem moves Before 2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips After 2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	449b21bfab	x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3} 2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	08810a8895	x86/flacdsp: remove unneeded ifdeffery x86inc can translate r*m into a register or stack on its own Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
Ronald S. Bultje	3aefca68ca	vp9/x86: add myself to copyright holders for loopfilter assembly.	10 years ago
Ronald S. Bultje	afd8c464b7	vp9/x86: make filter_16_h work on 32-bit.	10 years ago
Ronald S. Bultje	b26bc3520f	vp9/x86: make filter_48/84/88_h work on 32-bit.	10 years ago
Ronald S. Bultje	8a1cff1c35	vp9/x86: make filter_44_h work on 32-bit.	10 years ago
Ronald S. Bultje	047088b8c6	vp9/x86: make filter_16_v work on 32-bit.	10 years ago
Ronald S. Bultje	0cc9c23ea1	vp9/x86: make filter_48/84_v work on 32-bit.	10 years ago
Ronald S. Bultje	6433a9133f	vp9/x86: make filter_88_v work on 32-bit.	10 years ago
Ronald S. Bultje	75f8e52089	vp9/x86: make filter_44_v work on 32-bit.	10 years ago
Ronald S. Bultje	7f80c3344c	vp8/x86: save one register in SIGN_ADD/SUB.	10 years ago
Ronald S. Bultje	8ea2194ebb	vp9/x86: store unpacked intermediates for filter6/14 on stack. filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.	10 years ago
Ronald S. Bultje	e42409479f	vp8/x86: move variable assigned inside macro branch. The value is not used outside the branch.	10 years ago
Ronald S. Bultje	418c202c63	vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.	10 years ago
Ronald S. Bultje	d1c55654e1	vp8/x86: remove unused register from ABSSUB_CMP macro.	10 years ago
Ronald S. Bultje	e59bd08986	vp9/x86: slightly simplify 44/48/84/88 h stores.	10 years ago
Ronald S. Bultje	8132629bd5	vp9/x86: make cglobal statement more conservative in register allocation.	10 years ago
Ronald S. Bultje	c013ca58c5	vp9/x86: save one register in loopfilter surface coverage.	10 years ago
James Almer	32c836cb11	x86/vp9: remove duplicate function prototypes Fixes "redundant redeclaration" warnings. Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	7696e429c7	x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
James Almer	a4d62f7775	x86/constants: fix alignment of pw_255 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	bdc1e3e3b2	vp9/x86: intra prediction sse2/32bit support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	b6e1711223	vp9/x86: invert hu_ipred left array ordering. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	0a7964dca5	vp9/x86: save one register on 32bit idct32x32. Fixes build on win32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	cae893f692	vp9/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ronald S. Bultje	fd77fbb390	vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	a03f72e744	avcodec/x86/hevc_mc: fix sse register counts These fix failures of --enable-xmm-clobber-test It would be better to change the code to use fewer registers, but until someone does the used register count must not be too small Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	d43d5c5707	avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	ed9be7dd47	avcodec/x86/pngdsp: fix off by 1 error This fixes artifacts in the last pixel of rows with some widths and pixel formats Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Kieran Kunhya	9a738c27dc	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	10 years ago
Reimar Döffinger	49d9cbe55d	h264_i386: Fix operand size Fixes fate failure on macosx clang x86-64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Christophe Gisquet	9fa056ba75	pngdsp x86: use unaligned access For test images manually generated to contain only up prediction, timing results: 8380x3032 255x185 before: 138635 1992 after: 139232 1996 Actually jumping to the proper version depending on the alignment: 8380x3032: 138767 A 0.5% speed improvement for gigantic images is not worth the code duplication. Fixes ticket #4148 Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com> Tested-by: Benoit Fouet <benoit.fouet@free.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Kieran Kunhya	36091742d1	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Vittorio Giovara	9c12c6ff95	motion_est: convert stride to ptrdiff_t CC: libav-stable@libav.org Bug-Id: CID 700556 / CID 700557 / CID 700558	10 years ago
Carl Eugen Hoyos	600e38f563	Fix standalone compilation of the apng decoder on x86.	10 years ago
Michael Niedermayer	65ce8f8895	avcodec/x86/Makefile: fix order Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	d3512a0e89	avcodec/x86/lossless_audiodsp: fix fallback code for 32bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Michael Niedermayer	4327088da3	avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Reimar Döffinger	478c61ccb2	h264_i386: Optimize decode_significance_8x8_x86 for 64 bit. 11674 -> 10877 decicycles on my Phenom II. Overall speedup was unfortunately within measurement error. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	10 years ago
James Almer	3cec54b7d7	x86/flacdsp: add SSE2 and AVX decorrelate functions Two to four times faster depending on instruction set, block size and channel count.	10 years ago
James Almer	84ccc317ce	x86/flacdsp: separate decoder and encoder dsp initialization Signed-off-by: James Almer <jamrial@gmail.com>	10 years ago
James Almer	7292b0477a	x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx Handle it inside the __asm__() block. Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	11 years ago
Henrik Gramner	2d91abade2	x86: h264_intrapred: Don't treat 32-bit integers as 64-bit The upper halves are not guaranteed to be zero in x86-64. Signed-off-by: Anton Khirnov <anton@khirnov.net>	11 years ago
Mickaël Raulet	4ba6371a83	x86/hevc: get rid off packusdw for ssse3 compatibility cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	0de1d6287e	x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2} 2x to 2.5x faster than the C version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	11 years ago
James Almer	acebff8e5d	x86/mpegvideoencdsp: improve ff_pix_sum16_sse2 ~15% faster. Also add an mmxext version that takes advantage of the new code, and build it alongside with the mmx version only on x86_32. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	11 years ago
Michael Niedermayer	d22e88d120	avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse* Fixes acodec-dca2 fate failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
James Almer	26cd7b1e1a	x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2} About two times faster than the c wrapper. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	11 years ago

1 2 3 4 5 ...

1892 Commits (7ccd625a46c5a5a2f1cd6a20d7a6bf8137c7191c)