FFmpeg

Author	SHA1	Message	Date
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2 years ago
James Darnley	13d71c28cc	avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)	8 years ago
Martin Storsjö	f1a9eee41c	x86: Add missing movsxd for the int stride parameter Signed-off-by: Martin Storsjö <martin@martin.st>	9 years ago
Ronald S. Bultje	26ece7a511	vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.	10 years ago
Christophe Gisquet	ed450d4acf	x86: lavc: share more constant through defines Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Diego Biurrun	55519926ef	x86: Make function prototype comments in assembly code consistent This helps grepping for functions, among other things.	11 years ago
Diego Biurrun	edd1f833fa	x86: h264_idct_10_bit: Use proper type in function prototype comments	11 years ago
Thilo Borgmann	d814a839ac	Reinstate proper FFmpeg license for all files.	12 years ago
Ronald S. Bultje	62844c3fd6	h264: Integrate clear_blocks calls with IDCT The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>	12 years ago
Ronald S. Bultje	1acd7d594c	h264: integrate clear_blocks calls with IDCT. The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	12 years ago
Diego Biurrun	2b479bcab0	build: Drop AVX assembly ifdefs An assembler able to cope with AVX instructions is now required.	12 years ago
Diego Biurrun	04581c8c77	x86: yasm: Use complete source path for macro helper %includes This is more consistent with the way we handle C #includes and it simplifies the build system.	12 years ago
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	12 years ago
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	13 years ago
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Ronald S. Bultje	c83f44dba1	h264_idct_10bit: port x86 assembly to cpuflags.	13 years ago
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	13 years ago
Michael Kostylev	3206cccc0e	h264: mark h264_idct_add8_10 with number of XMM registers. This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	13 years ago
Kieran Kunhya	b1766c170c	Move x264asm to libavutil. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Dave Yeo	cc73511e8e	Fix NASM include directive Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Ronald S. Bultje	b2c087871d	Move x86util.asm from libavcodec/ to libavutil/. This allows using it in swscale also.	14 years ago
Ronald S. Bultje	3a39195b1d	Move x86inc.asm to libavutil/. This allows using it in libswscale/ also.	14 years ago
Jason Garrett-Glaser	c90b94424c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Jason Garrett-Glaser	504811baea	Roll back 4:4:4 H.264 for now Needs some ARM/PPC asm modifications.	14 years ago
Jason Garrett-Glaser	c9c493872c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Loren Merritt	53be7b23e9	Cosmetic changes to h264_idct_10bit.asm. Removes redundant dword tags and whitespace changes. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Loren Merritt	994c3550ff	2x faster h264_idct_add8_10. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Daniel Kang	836f47d34b	Add IDCT functions for 10-bit H.264. Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	14 years ago

41 Commits (dbf1c6f5f1f2cfaf4837e72d0c77f675a4318522)